FORMAL DEVELOPMENT OF OPEN DISTRIBUTED SYSTEMS: INTEGRATION OF UML AND PVS Doctoral Dissertation

FORMAL DEVELOPMENT OF OPEN DISTRIBUTED SYSTEMS: INTEGRATION OF UML AND PVS Doctoral Dissertation by Demissie Bediye Aredo Submitted to the Faculty of Mathematics and Natural Sciences, at the University of Oslo in partial fulfilment of the requirements for the degree Dr. Scient. in Computer Science August 2004 To my sister Dirribee Bediye Aredo Abstract In this thesis, a research work conducted on formalization of the Unified Modeling Language (UML) notations is reported. Formal semantic definitions for UML modeling constructs are provided by systematically transforming them into suitable and well-defined entities in the specification language of the Prototype Verification System (PVS). As UML is an industry standard modeling language consisting of several aspects of object-oriented modeling techniques, it is not feasible to cover all semantic aspects of the UML notations. Static structural models (class diagrams), and dynamic behavioral models (sequence and statecharts diagrams) are the main focus of the thesis. A strategy for deriving semantic models directly from UML graphical models, and a framework for integrating the UML modeling techniques with formal analysis techniques of the PVS environment is proposed. Transformation of UML graphical models into PVS specifications results in semantic models that are amenable to rigorous analysis, thereby overcoming limitations inherent in the semi-formal UML notations. This paves a way for developing formal techniques that support rigorous development of distributed systems through transformation and enhancement of OO modeling techniques. Integrating semi-formal graphical modeling techniques with a mathematically based development method(s) results in a development framework that supports rigorous model analysis, while useful features of the graphical modeling techniques are preserved. Automation of the derivation of formal specifications from graphical UML models based on the proposed semantics is vital as model analysis usually involves manipulation of large volume of information. In this regard, we have developed a prototype of a CASE tool that integrates the general-purpose PVS tool set with a UML CASE tool. The tool supports formal development of distributed systems from requirement capture to code generation and allows developers to deal with the graphical models they have developed while the rigorous analysis is performed at the back-end. This work contributes to the ongoing effort to provide formal semantics for the UML notations, with the aim of clarifying and disambiguating the language as well as supporting development of semantically-based CASE tools. Moreover, it allows exploitation of the synergy between formal methods (FM) and semi-formal modeling languages, which in turn improves the use of FMs in industrial settings. i ii Acknowledgements This work was financially supported by a grant from the Research Council of Norway under the research program for distributed IT-systems. Additional funding was provided by the Department of Informatics, University of Oslo, Norway. The work was carried out at the Department of Informatics, University of Oslo, and the Institute for Energy Technology (IFE), Halden, Norway, from February 1998 – March 2001. I would like to thank my supervisors Prof. Olaf Owe, and Dr. Wenhui Zhang for their follow-ups, encouragements, and invaluable comments without which this work would not have come to completion. I am indebted to my earlier supervisor Prof. Ketil Stølen who guided me through the early months of ’chaos’ and confusion. Colleagues who worked on the ADAPT-FT project in general, and Drs. Issa Traoré, Isabelle Ryl, and Einar Johnsen in particular deserve special thanks for their support. I always remember the informal and friendly atmosphere I enjoyed with the personnel and academic staff at the Department of Informatics, University of Oslo. I am grateful to all staff members at the Department of Informatics, in particular Mr. Narve Trædal for his courage in dealing with the administrative component of the thesis work, most of the formal procedures were unnoticeable. I had the pleasure of staying at IFE, in Halden, during my PhD candidacy. The people at IFE are all wonderful, and their support made the completion of this work possible. I am also grateful to the Research Council of Norway for the financial support – a crucial component for the successful completion of this thesis. I am also thankful to the Department of Computer Science, at the University of Kent at Canterbury (UKC), for allowing me to use the facilities in their Computing Laboratory. Dr. Stuart Kent and Prof. Keith Mander deserve special thanks for expressing their interest in my work, and above all for making my stay at UKC so comfortable. Finally, my most sincere thanks go to my family for their patience, and support in any way possible throughout the years. They had suffered my absence. August 2004, Oslo, Norway Demissie B. Aredo iii iv Table of Contents Abstract i Acknowledgements iii Table of Contents v Executive Summary vii 1 Introduction 1.1 Background . . . . . . . . . . . . . . . . . 1.2 The Problem Statement . . . . . . . . . . 1.3 Formal Methods . . . . . . . . . . . . . . . 1.4 Involved Notations and Formalisms . . . . 1.4.1 The Prototype Verification System 1.4.2 The Unified Modeling Language . . 1.5 Formal Semantic Definitions . . . . . . . . 2 Formalization of UML Notations 2.1 Motivation . . . . . . . . . . . . . . . 2.2 Formalization Approaches . . . . . . 2.3 State-of-the-Art . . . . . . . . . . . . 2.4 Formalization Issues . . . . . . . . . 2.4.1 Composition of UML Models 2.4.2 Checking Consistency of UML 2.4.3 Refinement . . . . . . . . . . 2.4.4 Formal Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Summary of Contributions 3.1 Formal Development of Distributed Systems . . . . . . . . . . 3.2 Semantics of Structural UML Models . . . . . . . . . . . . . . 3.3 Semantics of UML Sequence Diagrams . . . . . . . . . . . . . 3.4 Semantics of UML Statecharts in PVS . . . . . . . . . . . . . 3.5 Tracking Inconsistencies in Integrated Platforms . . . . . . . . 3.6 Enhancing Structured Reviews with Model-Based Verification v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 4 6 7 8 9 . . . . . . . . 13 13 15 16 19 19 20 21 22 . . . . . . 23 24 26 27 28 29 30 3.7 Summary of Major Achievements . . . . . . . . . . 3.7.1 Semantic Definitions for UML Notations . . 3.7.2 A Framework for Formal Development ODSs 3.7.3 CASE Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 32 34 4 Conclusions and Future Work 4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 37 38 A Formal Development of Open Distributed Systems: Towards an Integrated Framework 47 B Towards formalization of Structural UML Models in PVS 61 C An Integrated Framework for Formal Development of Open Distributed Systems 77 D A Framework for Semantics of UML Sequence Diagrams in PVS 95 E Semantics of UML Statecharts in PVS 119 F Tracking Inconsistencies in an Integrated Platform 135 G Enhancing Structured Review with Model-based Verification 157 H Formal System Development Using Method Integration: a Case Study193 vi Executive Summary The Unified Modeling Language (UML) [79, 91, 11] is an important industry standard (standardized by the Object Management Group (OMG)) for modeling software systems that has rapidly become popular among the software communities. The popularity of UML can largely be attributed to its graphical and intuitively understandable visual notations, and its capabilities to support encapsulation, data abstraction, extensibility, and reusability. It is indisputable that the UML reflects some of the best modeling experiences and incorporates notations that have proven useful in practice. Using UML for effective formal analysis in industrial setting could, however, be problematic due to the lack of precise semantic definitions for its graphical notations. The lack of firm semantic foundations for UML modeling constructs can lead to a number of problems: understanding of the models can be more apparent than real; developers may waste considerable time resolving disputes over usage and interpretation of notations; and model analysis and communication could be difficult [42, 100]. Defining precise semantics of a modeling language is a prerequisite for developing semantically based CASE tools, and for model communication. The primary objective of this thesis is to investigate semantics of UML description techniques, to make them amenable to rigorous model analysis by transforming them into semantic models. The specification language of the Prototype Verification System (PVS) [81, 82, 97] is used as an underlying semantic domain. A general framework for transforming graphical UML models into formal descriptions in the PVS specification language is also proposed. This paves a way for formal development of systems through a systematic transformation of UML models. The framework is used to transform UML modeling constructs, namely, static structural modeling constructs such as class diagrams, and dynamic behavioral modeling constructs such as sequence diagrams, and statecharts into semantic models in the PVS specification language. Transforming UML models into corresponding semantic models in the PVS specification language enables rigorous model analysis using the formal techniques of PVS and its tools such as type-checker, theorem-prover, and model-checker. Analysis of the resulting semantic models of reasonably large systems may involve processing of large size of software artifacts, which calls for a mechanized support - a criteria for whole-scale application of formal analysis techniques. In this regard, we have developed a platform vii that integrates a UML CASE tool and the PVS tool set. The platform supports formal development of distributed systems from requirement capture to code production and allows system designers to analyze the graphical models they have developed, while the formal stuff is processed at the back-end. This work is part of a long-term vision to explore how formal methods can be used to underpin practical tools for analyzing UML models. It contributes to the ongoing effort to meet the needs of software industry - improved quality and reliability, and lower production cost - by providing mathematical basis for the UML modeling techniques with the aim of clarifying the semantics of the language as well as supporting the development of semantically-based CASE tools. Organization of the Thesis The thesis is organized into several chapters. In Chapter 1, the problem to be addressed is introduced. Moreover, relevant aspects of formal methods and semantics, and modeling notations and methods involved in this work, namely the UML and the PVS are briefly introduced. In Chapter 2, some of the central concepts of formalization of OO modeling techniques are discussed. A literature survey of formalization of OO modeling languages with emphasis put on the formal semantics for UML notations is presented. In Chapter 3, a brief summary of the publications constituting the thesis and the main achievements are presented, whereas full texts of the publications are included as appendices. Finally, in Chapter 4, concluding remarks and future research issues are presented. List of Contributions The thesis consists of a number of stand-alone publications each of which is addressing a specific research issue. A roman-numbered list of the publications is given below. In later sections, we refer to the publications by their respective numbers in the list. The publications are listed in the order they have been summarized in chapter 3 to obtain a logical flow. The versions of the publications included in the sequel may differ from the published ones due to minor editorial fixes, reformatting necessary to give the thesis a uniform layout, and in some cases discussions of new issues. [I] I. Traoré, D. B. Aredo and K. Stølen: Formal Development of Open Distributed Systems: Towards an Integrated Framework, in the Proc. of the Workshop on Object-Oriented Specification Techniques for Distributed Systems and Behaviors (OOSDS’99), Sept. 27, 1999, Paris, France. [II] D. B. Aredo, I. Traoré and K. Stølen: Towards Formalization of Structural UML Models in PVS, Research Report No. 272, Department of Informatics, University viii of Oslo, August 1999. Presented to the 11th Nordic Workshop on Programming Theory (NWPT’99) October 1999, Uppsala, Sweden, pp. 49. [III] I. Traoré, D. B. Aredo and Hong Ye: An Integrated Framework for Formal Development of Open Distributed Systems, Journal of Information and Software Technology (IST), Elsevier Science, a Special Issue on Software Engineering, Applications, Practices and Tools, from the ACM SAC 2003, vol. 46, no. 5, pp. 281-286, April 15, 2004. An earlier version appeared in the in the proc. of ACM Symposium on Applied Computing (SAC 2003), March 9-12, 2003, Melbourne, Florida, USA. [IV] D. B. Aredo: A Framework for Semantics of UML Sequence Diagrams in PVS, Journal of Universal Computer Science (J.UCS), Springer-Verlag Co. Pub., vol. 8, no. 7, pp. 674-697, July 2002. [V] D. B. Aredo: Semantics of UML Statecharts in PVS, in the Proc. of the 7th International Multi-conference on Systemics, Cybernetics and Informatics (SCI2003), July 27-30, 2003, Orlando, FL, USA. [VI] I. Traoré, D. B. Aredo and K. Stølen: Tracking Inconsistencies in an Integrated Platform, Research report No. 274, Department of Informatics, University of Oslo, Norway, August 1999. [VII] I. Traoré and D. B. Aredo: Enhancing Structured Review with Model-based Verification, the IEEE Transactions on Software Engineering (to appear). An earlier version appeared in the Proc. of CAV’01 Workshop on Inspection in Software Engineering (WISE’01), July 2001, Paris, France. [VIII] D. B. Aredo and O. Owe: Formal System Development Using Method Integration: a Case Study, Research Report no. 308, Department of Informatics, University of Oslo, February 2004. The publications coauthored with Prof. Stølen were published while he was my principal supervisor. The cooperation with Dr. Traoré started when he held a one year post-doc position associated with the ADAPT-FT project, which also included my own doctoral fellowship. Other Related Publications My contributions to the following publications are results of the work done in the context of the thesis project, but not included in the thesis. Cooperation with the coauthors started at the time they were working on the ADAPT-FT project1 . 1 http://www.ifi.uio.no/˜adapt/ ix • E. B. Johnsen, W. Zhang, O. Owe and D. B. Aredo: Combining Graphical and Formal Development of Open Distributed Systems, M. Butler, L. Petre and K. Sere (Eds): IFM2002, LNCS 2335, pp. 319-338, Springer-Verlag, Berlin, Heidelberg, 2002. • E. B. Johnsen, W. Zhang, O. Owe and D. B. Aredo: Specification of Distributed Systems with a Combination of Graphical and Formal Languages, in the Proc. of the 8th Asia-Pacific Software Engineering Conference (APSEC2001), IEEE Press, December 4-7, 2001, Macau SAR, China. • W. Zhang, E. B. Johansen, O. Owe, and D. B. Aredo: Integrating UML and OUN for Specification of Open Distributed Systems, in the Proc. of Symposium on Visual Languages and Formal Methods, 2001 IEEE Symposium on HumanCentric Computing Languages and Environments, September 2001, Stresa, Italy. x Chapter 1 Introduction 1.1 Background Distributed computing environments are among the most active research areas in Computer Science. They have gained considerable popularity among system developers and researchers mainly due to the distributive nature inherent in modern computing tasks. Distributed systems provide several substantial benefits over their centralized sequential counterparts. Reduced incremental costs, extensibility, better reliability and response, and high performance are among the potential advantages of distributed computing environments over centralized systems [110]. Their intrinsic characteristics such as resource sharing, openness, concurrency, non-determinism, transparency, and fault tolerance make the design and development of distributed systems exceedingly difficult [28]. Consistency issues frequently arise, for instance, from separation of processing resources and the concurrency in distributed systems. Hence, the benefits that they bring are not readily available, but they can only be achieved at the cost of exceedingly difficult design and development process. Object-oriented analysis and design (OOAD) methods have features such as encapsulation, restructuring, reusability, and data abstraction, which make them effective to describe open distributed systems (ODS). The RM-ODP [56], for instance, advocates the use of OOAD methods in the development of ODSs. Several object-oriented design and analysis methodologies and notations have been proposed since the mid 1970s [89, 98]. The most recent and popular notation is the Unified Modeling Language (UML) [79, 91, 11] that resulted from a unification of modeling concepts of the OMT [90], Booch [10], and Object-Oriented Software Engineering (OOSE) [54] methods. UML became popular among the software community mainly because of its visual, and intuitively appealing graphical notations and useful structuring mechanisms. It is based on a set of OO description techniques and modeling notations. It is indisputable that UML reflects some of the best modeling experiences 1 and incorporates notations and techniques that have proven useful in practice. However, using UML in rigorous analysis and design of critical systems in the industrial settings could be problematic due to the lack of precise semantics and rigorous analysis techniques. The missing formality in OO modeling techniques hampers evaluation of UML models for completeness, consistency, and contents of requirement and design specifications. Without precise semantic definitions for UML modeling notations, integration of UML with other rigorous software development methods would be difficult [12]. Formal development methods (FDM) play an important role in addressing the problems inherent in informal (or semi-formal) OO modeling notations. Traditionally, FDMs are involved in the software development process to support precise specification of computerized systems. They provide a strong support for system descriptions with precise meanings and concise strategies for decomposition, design, verification and validation - crucial requirements in developing systems with high reliability, mainly due to large volume of information that is involved in detailed system description and analysis. Unfortunately, none of the existing formal methods addresses all issues related to features that characterize contemporary distributed systems [29]. The problem can be addressed in several different ways. A naive approach would be to build up, from scratch, a completely novel methodology that addresses all issues central to formal development of distributed systems. This approach is, however, very challenging and economically inefficient as argued by Abadi et al [1]. ”A new class of systems is often viewed as an opportunity to invent a new semantics. A number of years ago, the new class was distributed systems. More recently, it has been real-time systems. The proliferation of new semantics may be fun for semanticists, but developing a practical method for reasoning formally about systems is a lot of work. It would be unfortunate if every new class of systems require inventing new semantics, along with proof rules, languages, and tools.” In the spirit of Abadi et al, a manageable approach should attempt to extend, generalize, integrate, and tune existing methods to address problems specific to distributed computing environments. This approach consists of a series of tasks that need to be accomplished. - Firstly, existing modeling techniques, and formal methods, and their respective CASE tools need to be investigated in order to figure out their strengths and weaknesses in the context of development of distributed systems. The evaluation of several existing methods and CASE tools undertaken by Stølen et al [102, 103] found UML techniques suitable for modeling distributed systems, and identified the PVS specification language as a suitable underlying semantic foundation in the formalization of UML notations; - Secondly, a framework for integration of the chosen modeling notation(s) and 2 formalism(s) should be developed. The integrated framework can be geared towards description and analysis of specific features of distributed systems. The integration combines one or more graphical modeling notation that are suitable for addressing development issues intrinsic to distributed systems, and a formalism that enables us to deal with rigorous model analysis; and - Thirdly, a CASE tool that supports the development of distributed systems needs to be developed to automate the step-wise development process from requirement capture to code production. Such a tool is crucial as a rigorous reasoning about system models may involve a large size of software artifacts, too large to be manipulated manually. There are clear advantages of integrating semi-formal graphical OO modeling techniques with a mathematically-based formalism into a development framework that allows rigorous model analysis. Such an integration, however, may raise serious problems such as the consistency issue that need to be carefully addressed to obtain a correct and reliable development framework. Checking consistency across different aspects of the system is necessary to establish that different specifications do not impose conflicting requirements. Mechanism for consistency checking varies depending on the features of the notations integrated and requires different approaches. Techniques for checking consistency between different viewpoint specifications of open distributed processing (ODP) have been addressed thoroughly in the literature [59, 68, 9, 14]. 1.2 The Problem Statement Design and development of distributed systems are difficult due to their complexity, heterogeneity, distribution and large size. Object-oriented modeling notations such as the Unified Modeling Language (UML) [79] provide rich structuring mechanisms necessary to manage the complexity of descriptions of distributed systems. The UML has become popular among software developers due to its graphical notations, which are easy to learn and use. One of the major limitations of UML is, however, that semantic definitions of its notations is given in a natural language. The lack of mathematically-based semantic definitions for the UML notations constrains its efficiency in rigorous model analysis, which in turn hampers its application to the development of critical systems in the industrial settings. A well-defined and fully explored semantic definition for UML notations is crucial as the lack of such firm semantic foundation can make understanding of models more apparent than real [99]. It is difficult to determine whether or not a design is consistent, or a design modification is correct, or a program correctly implements a design. Evaluation of completeness, and consistency of contents of requirements and design specifications of systems will also be difficult. Hence, there is a strong need for precise semantic definitions for UML 3 notations. Formal development techniques can be used to achieve the level of rigor necessary for the development of critical systems. However, due to the esoteric features of formal methods, software developers will not, in the foreseeable future, be willing to use abstract formal languages and notations to design software systems [74]. Hence, an optimal solution should strike a balance between the ease of use and the level of rigor. Motivated by the need for a development framework, and a supporting CASE tool that is easy to use and at the same time allows rigorous analysis, this thesis investigates how the diagrammatic UML notations and the PVS specification language can be integrated to support formal development of open distributed systems. The framework integrates the best practice in the software development using visual modeling languages such as the UML, and mathematically-based analysis techniques underlying formalisms such as the PVS to support rigorous development. It allows developers to work on the graphical models they have developed while the formal ”stuff” is processed at the back-end. Formally reasoning about a real-world size software system involves manipulation of a large size of software artifacts - too large and complex to handle manually. Thus, automation of the rigorous analysis is essential. In this regard, a prototype of a CASE tool that supports the framework is developed by integrating the respective CASE tools of UML and PVS into a single platform. The platform allows modeling in UML, mechanized transformation of the UML models into PVS specifications resulting in models amenable to rigorous analysis, and formally reasoning using the PVS tool set to reveal any inconsistencies and/or incompleteness. 1.3 Formal Methods A formal method (FM) refers to the use of mathematically based concepts and techniques in the development of computer systems. A FM is characterized by a formal specification language and a set of rules governing the manipulation of expressions in the language [113]. A specification language is the specifier’s primary tool during the initial stages of system development. Choosing appropriate notations for the description of a system is not as trivial as one might think, because there is a certain degree of trade-off between the expressiveness of the specification language and the level of abstraction it supports [13, 113]. Specification languages that have wider ’vocabularies’ and constructs can support description of a particular class of systems one wants to deal with, but they may incline towards a particular implementation. Languages with smaller ’vocabularies’ on the other hand, offer high level of abstraction and little implementation bias (e.g. the language of Communicating Sequential Processes (CSP) [52] has only processes and events as a basic entities). FMs can be used for different purposes, in many ways and styles, and with varying 4 rigor. The earliest FMs were concerned with proving programs correct, i.e. assuming that a correct specification is available, the goal is to show that a program in some concrete programming language satisfies the specification. Contemporary FMs provide framework for specifying, developing, and verifying systems in a systematic way. They also provide mechanisms for proving that a given system specification is realizable, that the specification is implemented correctly, and for proving properties of system without necessarily running the system to determine its behavior. FMs aim at using sound mathematical techniques, usually provided through specification languages, in order to make software development activities precisely defined, checked and ultimately automated. The mathematical basis allows precise definition of notions such as consistency, and completeness, and more relevantly, specification, implementation, and correctness [113]. The primary purpose of using FMs is to help engineers construct more reliable systems. They can be used at all stages of software development process - from initial customer’s requirement capture through system design, implementation, testing, debugging, maintenance, verification and evaluation. When used at earlier stages of system development, FMs can reveal design flaws that might, otherwise, not be discovered before the more costly testing and debugging phases. When used at later stages of development, FMs can help developers in determining correctness of system implementations and equivalence of different implementations. Tangible results of applying FMs to system development are formal specifications - precise and usually concise system descriptions. A specification may serve as a contract and a means of communication among the stakeholders: customers, specifiers, implementers, etc. If the syntax of the a specification language is defined explicitly, a syntactic analysis tool can be developed. Furthermore, if the semantics of the language is sufficiently restricted, rigorous model analysis can be performed and tools can also be developed to automate the analysis. Hence, formal specifications have advantages, over their informal counterparts, of being amenable to rigorous and mechanized analysis and manipulation. Another advantage of using FMs in system development is that they allow developers to concentrate on what is required at an abstract level, i.e. developers focus directly on aspects of interest and avoid distractions entailed by implementation details [77]. By relieving the mind of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and in effect increases the mental power of the race. – Alfred North Whitehead [13] Formal specification and verification process involve considerable syntactic details and require careful planning and organization to obtain modular system specifications. A strong tool support is a prerequisite for an effective use of formal development methods in real-world problems. With the introduction of CASE-tools, in particular theoremprovers and model-checkers, construction of mechanically and interactively checkable 5 proofs of consistency and well-foundedness has become feasible [13]. Most of the Formal Methods incorporate theorem-prover as a part of the method itself, e.g. PVS [81, 82], HOL [45]. 1.4 Involved Notations and Formalisms This thesis has been undertaken within the ADAPT-FT project1 . The decision to use existing languages such as the UML [79] and the PVS [81, 82], and to create a new language, known as the Oslo University Notation (OUN) [80], was taken at the project level based on the result of investigation that compared several specification languages and formalisms [102]. The main objective of the ADAPT-FT project was to adapt, tune, develop, and extend formal methods towards the special needs of distributed systems. To achieve this, an underlying semantic foundation was needed, preferable a foundation already implemented with a series of powerful tools. PVS was a natural choice in this respect, especially due to its strong type systems and functional sub-language, covering inductive data types and inductively defined functions, and its reasoning capabilities and tools, including some model-checking and theorem-proving facilities. As UML was emerging as an industry standard for object-oriented modeling languages and gaining popularity among software developers, it was chosen as one component of the ADAPT-FT integrated platform. PVS provides a vehicle for defining formal and precise semantics of the UML and OUN languages and for defining the associated specification formalism, including concepts for refinement and composition. At the same time, it allows development and reuse of the semantic definitions in the design of tools, such as forms of reasoning tools. Even though the nature of PVS may be mathematically challenging to software engineers, a semantic foundation from which engineering tools that are less esoteric may be developed is needed. For instance, in the ADAPT-FT platform, integrating UML, OUN, Java and PVS, and with translation from UML to OUN and PVS and from OUN to java and to PVS, one may develop tools at the level of UML diagrams or OUN programs, where the implementation of the tool is done at the PVS level (by means of PVS translations). Tools giving yes/no answers require no insight in PVS, and may provide useful feedback to the software engineer. It would of course be desirable to have tools giving UML or OUN related feedback, built from PVS related tools; however, this is beyond the scope of the ADAPT-FT project. 1 http://www.ifi.uio.no/˜adapt/ 6 1.4.1 The Prototype Verification System The Prototype Verification System (PVS) [81, 82] is an environment for formal specification of systems. It combines a highly expressive specification language with a powerful interactive theorem-prover that provides a mechanized support for verification and validation. PVS is mainly intended for formalization of requirements and design-level specifications, and for analysis of problems. It is being used for verification of complex software systems, especially in the aeronautics industry. The PVS specification language extends a strongly typed higher-order logic of total functions. Its type system is augmented with features such as predicate subtypes, dependent types, and recursive data types. These features are vital for facile mathematical expression as well as symbolic manipulation [97]. Types impose a useful mechanism within a specification langauge. They also allow early detection of a large class of syntactic and semantic errors. A distinctive feature of the PVS specification language is the predicate subtyping. Predicate subtypes and dependent types are powerful specification concepts as a lot of information can be encoded into the types. Predicate subtyping enables us, for instance, to deal with partial functions in the logic of total functions by restricting the domain of definition to an appropriate sub-domain. Type checking with predicate subtypes is, however, undecidable and generates proof obligations, the so-called Type Correctness Conditions (TCCs), whenever type conflicts cannot be resolved. For instance, the arithmetic division operation can be introduced with the domain given as a subtype of numbers consisting of nonzero numbers. If applied to a term not known to be nonzero, a proof obligation is generated. In developing specifications using predicate subtypes and dependent types, the TCCs may provide useful information about the consistency and completeness of the specification. In practice, most of the TCCs are discharged automatically by using the theorem-prover, whereas more involving ones require user interactions. Specifications in PVS are organized into, possibly parameterized, hierarchies of theories. Parameterized theories provide a mechanism to develop generic, and reusable templates of specifications and proofs. A theory may contain assumptions that are used to specify constraints on the parameters of the theory, definitions, axioms and theorems. Axiomatic specifications are effective for certain problem domains, but may introduce inconsistencies. Definitional specifications avoid this problem and are guaranteed to provide conservative extensions. PVS supports both axiomatic and definitional paradigms. Modularization of large PVS specifications is achieved by structuring them into hierarchies of theories by using the IMPORTING clause that makes previously defined theories available. When a parameterized theory is instantiated, proof obligations are generated in accordance with the assumptions on the parameters. 7 In this section, we presented a brief overview of the PVS environment. For a more detailed description of PVS, interested readers should refer to the PVS language reference [81] and the prover guide [96]. The tutorial by Rushby [93] gives a good introduction to the PVS environment. 1.4.2 The Unified Modeling Language The Unified Modeling Language (UML) [79, 91, 11] is a notation for specifying, visualizing and documenting artifacts of object-oriented software-intensive systems. UML is a de-facto industrial standard for OO modeling languages. By the time this thesis work was undertaken, the accepted standard version was UML v1.3 [79]. UML was mainly intended to be a general purpose OO modeling language that supports encapsulation, data abstraction, reusability, and adaptation and extension mechanisms towards specific application domains. It was also intended to be a visual, graphical and intuitively understandable notation, that is complete in the sense that it can be used to describe and model all aspects of a system appropriately [36]. In order to meet the intended objectives, UML combines several modeling ’sub-languages’, each of which is suitable for describing a specific aspect of an OO system design. That is, a system is modelled by a set of sub-models, called views each of which is focusing on a specific system aspect. A given aspect of a system can be modelled from different perspectives, thus leading to overlapping and even redundant or conflicting specifications of certain system aspects. As argued by Engels et al [36], the approach of providing overlapping, non-orthogonal sub-models eases the specification process as it allows incremental description of an aspect by inter-relating it to other aspects. In contrast, the use of different, even non-orthogonal, sub-languages for modeling a system increases the danger of inconsistencies between the sub-models, and requires additional mechanism to prevent the inconsistencies. This calls for a common semantic foundation where semantics of modeling constructs of involved sub-languages are defined to allow rigorous model analysis and check ensure consistency and completeness of the sub-models. System aspects can be categorized into static structural aspects and dynamic behavioral aspects. UML consists of several description mechanisms necessary to specify static structural, dynamic behavioral, and model management aspects. The structural modeling constructs include, among others, class diagrams and object diagrams that are used to model structural aspects at type and instance levels, respectively. They originated from EntityRelationship diagrams [22] and provide a means to specify the structure of objects and possible structural relationships among them. They are especially useful to capture system requirements at early development phases, and to extract classes and attributes from requirement descriptions. 8 Among the basic structural relationships are inheritance, aggregation and association. An aggregation is a special type of association that describes dependency between two objects: a ’whole’ and a ’part’. In UML, two types of aggregations are distinguished: physical and logical. In a physical aggregation, known as a composition, an object can only be a part of at most one aggregate object, i.e. there is no sharing of parts between composite objects. There is no such restriction on the logical aggregation. Behavioral modeling constructs consist of among others interaction diagrams, and statechart diagrams. A UML sequence diagram, a variant of the classical message sequence chart (MSC) [53, 25], is a kind of interaction diagram that is used to describe a single flow of communication or a subset of a set of communication flows in a system. Emphasis is put on description of communication between objects or groups of objects described visually in time order. A collaboration diagram is another kind of interaction diagram organized around object roles to explicitly show relationships among the objects. Unlike sequence diagrams, a collaboration diagram does not show time flow, thus the order of messages and concurrent threads are determined by numbering. UML statechart diagrams are based on the classical statecharts invented by Harel [47]. A statechart diagram basically consists of states and state transitions, and describes the life cycle of a model element and its reaction in response to events it receives. A state represents a condition during the life cycle of an object in which it responds only to certain events, or performs certain actions. A complete system specification may involve several description techniques each of which is efficient to describe only certain aspects of the system resulting in partial specifications, e.g. class, sequence, and deployment diagrams. Thus, it is necessary to define precisely how these partial specifications are combined into a complete, and consistent system specification. Transforming UML modeling techniques into a common formal foundation, or possibly an integration of several formalisms, minimizes the challenge of reasoning about consistency and completeness of system models. One of the main objectives of this thesis is to contribute to the ongoing effort to provide semantic foundation for the UML notations. 1.5 Formal Semantic Definitions In a conventional textual notation, syntax is described as a set of characters and possible sequences of the characters. The set of all syntactically valid sequences of characters is referred to as a language. When graphical notations are involved, the situation becomes more complicated since the syntax does not deal with sequence of characters, rather with graphical constructs. Syntactic issues purely focus on the notation, disregarding any intention behind the notation. A syntax defines a language of well-formed declarations and statements, whereas the semantic definitions determine the meaning 9 of every construct of the language in question. In general, a formal semantic definition is a mapping of a given notation, usually called syntactic domain, into a suitable and well-known formal notation, usually called semantic domain. Given a modeling language, providing formal semantic definition for its constructs consists of the following major steps: - defining the syntactic notation that provides abstraction of the language. The syntax of a language defines basic constructs that exist in the language and how constructs are built up from the basic constructs, and often provides an algorithm to transform or parse the language; - identifying a semantic domain - an abstraction of reality that describes important aspects of systems to be constructed; and - providing definitions of semantic mappings from the syntactic domain into the semantic domain. If a semantic mapping M : [N → S] is explicitly defined, then it would be possible to reason about its correctness. Defining M algorithmically enables software engineers to translate documents in notation N into documents in the specification language of the underlying semantic foundation S, and to use verification techniques in S [92]. For instance, suppose that a predicate P : [S → bool] describes a consistent and correct implementation of a specification written in S. A requirement for this property to hold is that no contradiction is found in the specification. Then, software engineers can apply this to documents translated from N to S. A drawback of this approach is that the engineer must be able and willing to understand both the syntactic and semantic domains, respectively, N and S, which is typically not the case as engineers want to work only with notation N. A better approach would emerge if correctness and consistency of semantic definitions for all constructs of notation N is proved. Symbolically, ∀ d ∈ N : P(M(d)) Then, software engineers using notation N could be sure that its constructs have consistent semantic definitions without necessarily being explicitly exposed to the underlying semantic domain. The static semantics of a modeling language describes how instances of modeling constructs of the language should be related to each other. , For example, the static semantics of UML modeling abstractions are given as well-formedness rules that are described using the Object Constraint Language (OCL) [79, 112] and a natural language, English. OCL is based on first-order logic, and it is not expressive enough to capture all aspects of UML models, and does not provide sufficient support for rigorous model analysis [39]. Thus, a formalism with more expressive power that enables us define 10 semantics for UML modeling techniques, and that supports rigorous model analysis is needed. The PVS specification language [83] is found to be well-suited for providing underlying foundation for the UML models as it is based on higher-order logic, highly expressive, and provides a general semantic foundation. In this thesis, we investigate UML modeling techniques in order to provide semantic definitions for a subset of the UML constructs by mapping them into entities in the specification language of PVS. Moreover, a formal development framework for open distributed systems, based on the method integration approach and the semantic definitions is proposed. Providing explicit definition of a semantic domain is important as it allows one to understand the kinds of systems the language is intended for, and it is a prerequisite for comparing different semantic definitions [92]. Another advantage of providing formal semantic definitions for UML constructs is that it allows use of other verification and validation techniques, such as theorem-prover and model-checker, which were previously enjoyed only by formal specification languages. 11 12 Chapter 2 Formalization of UML Notations 2.1 Motivation The popularity of OO software development techniques such as the UML [79], and the Object Modeling Technique (OMT) [88] is primarily due to their intuitively appealing graphical modeling constructs, and powerful structuring mechanisms that are crucial for the software engineering. The importance of modeling techniques in software engineering might be comparable to that of mathematical techniques invented in the second half of the 19th century to model physical processes, and establishing their scientific foundations seems to have great significance [12, 16]. Despite their strengths in expressing a wide range of concepts central to software engineering, application of informal OO development techniques to non-trivial development projects can be problematic [39]. A major source of problems is the lack of precise semantic definitions for the modeling constructs, which may lead to misinterpretation of models. Without precise semantic foundation, formally checking consistency and completeness of models cannot be done correctly. Moreover, developing semantically-based CASE tools for automation of formal verification process may not be feasible. A requirement specification of a software system is a description of the objectives and functionalities of the system. It provides a basis for measuring quality of the endproduct, and for guiding the design and implementation of the system. A precisely formulated requirement specification that clearly describes functionalities of a system is crucial for successful completion of the development project. Errors are most likely introduced during early phases of development process, and they can severely affect reliability, and integrity of the system in question, and fixing them during later phases of software life-cycle is more expensive than during the earlier phases [8]. Use of formal methods and notations to describe syntax and semantics of modeling languages has several beneficial effects. A rigorously defined semantic foundation serves as a complete, and precise description of the meaning and effect of every syntactic construct of the language. In a development process that is based on such a 13 rigorous foundation, inconsistencies, incompleteness, and ambiguities in requirement specifications can be detected and corrected in earlier phases of development if the underlying formal method enforces them to behave as required. Formal development methods also make it possible to precisely describe and rigorously reason about important system properties: static structural and dynamic behavioral. For instance, to check that a given implementation satisfies the requirements stated in a specification of the system, i.e. to verify an implementation against a specification, it is necessary to provide their interpretations in a common semantic foundation. The semantic foundation provides unambiguous benchmark against which the level of understanding of developers or the performance of CASE tools can be measured [58]. Formal semantic definitions are essential in establishing properties of syntactic languages, e.g. its consistency and well-formedness. For a given modeling language L, let’s denote its syntactic notation by NL , and its semantic foundation by SL , and suppose that a semantic transformation M : [NL → SL ] is correctly defined. Formal analysis techniques available in the underlying semantic foundation can be used to argue about well-formedness, consistency, and completeness of models given in the syntactic notation. For instance, suppose that p : P RED[SL ] specifies a property that a given system specification is not implementable. Then, to prove that a given description d : NL of the system is realizable, we need to ensure that the following invariant holds true: ∀ (d : NL ) : (d ∈ Spec ∧ Impl(d)) ↔ ¬ p(M(d)) where Spec is the set of all specifications of the system in question. Hence, once a suitable semantic domain is identified and a transformation of syntactic constructs into the semantic entities is correctly defined, more reliable system specifications can be achieved, and it can be argued about the properties of the system in terms of the elements of the underlying semantic domain. As a result, some questions about system behaviors reduce to symbolic computations that can be checked, even mechanically. Another important benefit of using formal methods is the transferring of concepts such as refinement, abstraction, composition, etc. and corresponding analysis techniques from the formal semantic foundation to the syntactic domains. For instance, suppose that ¹: [SL → SL ] denotes a refinement relation in the semantic domain. If ¹0 : [NL → NL ] is the corresponding relation defined in the syntactic domain, then the following condition must hold for the mappings ¹, ¹0 and M: ∀ (d, d0 : NL ) : (d ¹ d0 ) ↔ (M(d) ¹0 M(d0 )) Precise semantic definitions are useful not only to system developers, but also to tool vendors, methodologists (those who create methods), and method experts (those who use the methods and know them in detail). They allow tool vendors to develop more reliable and semantically-based CASE tools. The use of formal methods in software development is, however, not without drawback. The major concern among developers is the esoteric nature of formal methods, 14 which remained a major barrier to their whole-scale utilization in the industrial settings. Despite a tremendous amount of work on making formal development techniques acceptable to the industrial software development community, unfortunately, a little progress has been made and there is still a lot to be done. The lack of powerful CASE tools that support formal development process also contributes to the problem. 2.2 Formalization Approaches Several works have attempted to provide mathematical basis for concepts underlying the UML notations using different formalization approaches. Some tried to formalize the UML modeling techniques directly by providing mathematical foundation for their concepts, others use one or more formalisms as underlying foundation and establish correspondence between elements of the informal UML notations and the formal entities of the domain, while others extend a given formal specification technique with OO features. In general three approaches to formalization of OO modeling techniques are identified [43]: supplemental, OO-extension, and method integration approaches. In the supplemental approach, informal OO modeling constructs are replaced by more formal constructs. The work of Moreira et al [75] is based on the supplemental approach. In the OO-extension approach, a novel or existing formal notation is extended with OO features, thus making them more compatible with the OO modeling language. For example, VDM++ [33], Z++ [63], and Object-Z [32] resulted from the OO-extension approach. A major limitation of these approaches is that they are not user friendly as developers still have to directly deal with a certain amount of formal artifacts which are esoteric - a significant barrier for whole-scale utilization of formal methods in industrial settings. Although a rich body of formal notation may be obtained, the OO-extension approach often results in a more complex semantics, and suffers from the lack of supporting CASE tools [37], [21]. The method integration is a more workable approach to formalization that combines (informal or semi-formal) OO modeling techniques with suitable formalism(s) making them more precise and amenable to rigorous analysis techniques [42]. It is the most commonly used approach to formalization of OO modeling languages and allows developers to directly manipulate graphical models they have created without having in-depth knowledge of the underlying formal ”stuff”, which is processed at the backend [37]. The works of Bruel et al [21], France et al [43], Shroff et al [99] are based on the method integration approach and advocate its use in software development process in the industrial setting. Since the involved languages are independent and their boundaries are preserved, checking consistency across the boundaries is necessary. Semantics of a modeling language is usually formalized by mapping the syntactic elements of the language into some well-defined and carefully selected semantic 15 foundation that enables us describe intended meanings of the modeling constructs. In general, there are two well-established methods for formalization of distributed computations: one method focuses on the events of message communication among system components (these methods are generally based on process algebras), whereas the other method focuses on states of the components and their transitions [93]. The PVS has been used in both methods [34, 57]. The need for integrated development environment is becoming more frequent in software engineering. It seems that if a tool vendor wants to propose a cutting edge tool, it has to use an integrated approach in some way. In the sequel, the method integration approach is adapted to propose semantic definitions for UML modeling techniques using the specification language of PVS [81, 91, 93] as underlying semantic foundation. The resulting semantic models allow well-formedness and consistency checks, which in turn enable us to formally argue about behaviors of systems we are modeling. 2.3 State-of-the-Art In this section, a survey of the literature on works related to formalization of UML modeling notations, semantic definition for its notations, and on object-oriented design and analysis is presented. A significant amount of research work has been undertaken towards improving the precision of OO modeling techniques by providing a mathematical basis to the concepts underlying the models [15]. The task of formalizing OO modeling techniques has been addressed using various available formalisms and approaches. Since the inception of UML, several researchers have been working on providing formal semantics for its constructs. In most cases, the works exclusively focus on a subset of the UML notations. For example, on static structural modeling techniques such as class diagrams, and object diagrams [21, 38, 39, 41]; or on the dynamic behavioral modeling techniques such as sequence diagrams [18, 30] and the statechart diagrams [31, 66, 65, 86, 94]. Several researchers and research groups are actively involved in the investigation of the semantics of UML modeling techniques. The pUML (precise UML) [85] group is one of the leading research groups in this area. It consists of several international researchers who share the aim of developing UML as a precise modeling notation [37, 38, 17, 15, 21, 43, 92]. The pUML group members are working towards making the core UML modeling concepts more precise and amenable to rigorous model analysis, and are concerned with the development of new theories and practices required to construct tools to support rigorous application of UML modeling techniques. In [37], Evans outlined formalization of UML class diagrams using a diagrammatic transformation approach, and developed ’sound’ rules for reasoning about the models. The Z notation [101] is used to precisely represent the abstract syntax, and wellformedness rules of UML class diagrams. The resulting representation, is manipulated 16 to identify some deductive transformation rules for class diagram. Because the reasoning is based on manipulations of diagrams, Evans argues that this approach can be used by practitioners without recourse to complex linguistic proof techniques. In their recent work, Evans et al [39] provided formal semantics for graphical modeling language and developed rigorous analysis tools that allows developers to directly manipulate the graphical UML models. They argue that the method integration approach has a limitation in the context of industrial use of formal modeling techniques as it requires in-depth knowledge of the underlying formal notation and its proof system. Though the authors claim that their approach is more efficient and easy-to-use, it is not economically feasible as it requires building of a new analysis techniques and/or CASE tools from scratch when there are hundreds of them available and can be extended, adapted, or integrated to suit our need. The Methods Integration Research Group (MIRG) at Florida Atlantic University conducted a considerable amount of work [42, 41, 99] on formalization of structural OO modeling techniques. Their work is based on the method integration approach and combines the OO analysis techniques of the Fusion method [26] with the specification language of Z [101] from which a mechanized environment called FuZE (Fusion/Z Environment) [20] has resulted. Basic concepts of structural UML modeling techniques such as classes, inheritance, aggregation, etc. are represented as Z schemas. The schemas are combined into a hierarchy of schemas that characterizes the overall system view. Invariants, usually expressed by annotations in structural UML models, are specified in the predicate part of Z schemas. The type name of an attribute of a class corresponds to the type name of the attribute of the Z class schema. An attribute type is defined as a basic type or a schema in Z. The relationships such as association, aggregation, and generalization are also represented as Z schemas. A binary association is represented as a relation where role names are simply the names of the domain and range of the relations. An aggregation structure is represented hierarchically by including Z schemas that represent the parts in the declaration part of the schema for the whole. In formalization of generalization, the superclass is represented in the same way as any other class. A subclass is considered to be a subspace of the superclass instance space, and are formally defined as Z state schemas in which a variable of the superclass type is declared along with the variables of the attribute of the subclass, which are not attributes of the superclass. The works [16, 17, 15] of a research group in the SYSLAB project at the Technical University of Munchen, on providing precise semantics for UML modeling techniques, uses an approach called Mathematical System Model (MSM) that is based on the theory of streams and stream processing functions [19]. Description techniques such as message sequence charts (MSCs), and statecharts are adapted, and specialized to allow precise semantic definitions. The authors claim that their approach provides integrated precise semantics that allow definitions of transformations between different specifications and 17 rigorous description of consistency conditions within and across boundaries of different description techniques. Each document, e.g. an object diagram, is regarded as a constraint on a system model. In order to provide a common basis to define integrated semantics for all description techniques, the mathematical framework is augmented by a notion of system model - a model that describe overall system view. Bourdeau et al [12] provide formal semantics of object modeling diagrams, with emphasis put on the Object Modeling Technique (OMT) [88] using algebraic specification techniques. A general framework for deriving modular algebraic specifications directly from diagrammatical object models is developed. The specification language of Larch [46] is used as underlying semantic foundation. The notion of instance diagrams [90] is extensively used in this work. A state space of an object model is, for instance, defined as a set of all such instance diagrams of that object model. UML sequence diagram, a variant of the classical Message Sequence Charts (MSCs) [53], is one of the dynamic modeling techniques of the UML notation. Semantic definition for MSCs is provided in Annex-B [25] to the standard document of MSCs [53] in terms of a specific process algebra for which operational semantics is provided. Other works on semantics of MSCs are due to Mauw et al [72, 71, 70] and provide formal semantics for basic MSCs based on process algebra. The authors justify the choice of process algebra as underlying foundation, and argue that all features such as the state operator and the global naming operator, incorporated into the theory of MSCs are related to topics in process algebra. Ladkin et al [60, 62, 61], interpret a MSC as a set of traces of accepted externally observable events, while internal process computation is ignored. Our work that was published in [5] is based on a similar approach. It is argued that this interpretation results in complete semantic model as MSCs focus on communication events. Broy [18] provides semantics for MSCs based on the theory of stream processing functions. A MSC is interpreted as a set of traces of input/output events that may occur in the system it describes. Some other works attempt to formalize UML notations by transforming them into a particular specification language. For example, Lano et al [64] use Real-Time Action Logic, a kind of real-time temporal logic to formalize semantics of UML state machines. Mikk et al [73] build semantics of statecharts from an Extended Hierarchical Automta, Seshia et al [94] translates statecharts into Esterel. Once the translation is ’correctly’ done, model analysis techniques available in the underlying formalisms can directly be applied to the resulting semantic models. This survey is by no means an exhaustive one, rather a brief overview of works that are most relevant to our work. For a more complete list of literature on this area of research, interested readers can refer to the UML bibliography maintained by Richters [87] at the University of Bremen, Germany. 18 2.4 Formalization Issues The impact of lack of precision necessary for rigorous analysis on use of modeling techniques in industrial settings has widely been recognized [43]. Rumpe [92] and Harel et al [48] clarify the main concepts involved in formalization of modeling languages with emphasis put on UML and its modeling techniques. Formalization of a language may involve the syntax that characterizes all possible expressions of the language, a semantic domain, and a semantic mapping from the syntactic expressions to the semantic domain. The mapping from syntax to semantics is usually intensional rather than extensional, which means that the mapping is not explicit [58]. In formalization of OO modeling techniques, the choice of a formalization approach and the underlying semantic domain is among the major decisions we have to make. The semantic domain should allow us to precisely and completely describe properties of models and rigorously reason about the models, which in turn strengthen verification and validation of the models [42]. Moreover, the semantic domain should have mechanisms that express relationships among models, e.g. compositions and refinements, and should support model analysis, e.g. consistency checking. In the rest of this section, we briefly discuss the notions of composition, consistency, refinement, and formal reasoning, i.e. model checking and proof checking in the UML context. 2.4.1 Composition of UML Models UML is a collection of several modeling techniques: state charts, message sequence charts, etc. Describing a given system using a single UML modeling technique captures only one aspect of the system resulting in a partial specification. For instance, UML class diagrams are effective in describing structural aspects of a system, whereas sequence diagrams are suitable for describing temporal properties of the system. To obtain a complete specification of a system, it would be necessary to combine several descriptions given in different modeling techniques. Combining several modeling techniques in a system development project results in a more expressive framework. Such an integration requires formal semantic definitions of the notations involved in a common semantic domain. The latter paves a way for rigorous analysis, and for underpinning practical CASE tools supporting the development framework with semantic foundation. Effective use of a multi-notation development framework requires a number of issues to be addressed. - How can we combine partial specifications given in different modeling techniques and notations into one model? - How can the results of analysis of different models be integrated in such a way that results from one analysis can be used in the other? 19 - How can we maintain consistency of the overall system specification obtained from composition1 of several partial specifications? For instance, given a complete2 description of the structural aspect of a system by a set of class diagrams CD, and description of interactions among components of the system by a set of sequence diagrams SD, the following requirement must be fulfilled: - For any sequence diagram and an object participating in the interaction specified by the sequence diagram, then the class of the object must be described in CD. Properties that need to be established between a class diagram and a statechart associated with a class specified in the class diagram can also be described in similar way. Combining different modeling techniques, in order to obtain a more complete description of the system, is a highly desirable phenomenon as a single UML model provides only a partial specification that focuses on certain aspects of the system. 2.4.2 Checking Consistency of UML models The method integration approach is a way of combining several notations and/or methods into a single development platform. Such a combination may raise the problem of consistency within and across the boundaries of the languages involved in the integration. In general, consistency issues that may arise in this context are classified into two: internal consistency checking, which ensures that models in the same notation do not introduce contradictory requirements; and external consistency checking, which deals with consistency problems across boundaries of different notations [9, 14]. The two categories are not mutually exclusive as there are several notations that are combination of other notations. In the case of UML, for instance, consistency between statechart models and a sequence diagram models can be considered either as internal consistency issue within the UML notation or as external across the statecharts and the message sequence charts (MSC) notations. In the integrated platform we proposed for the development of distributed systems [107], checking both internal and external consistencies is necessary. A framework for consistency check was described in [107] where system specification is given within a development environment that integrates the UML notation and its CASE tool, the OUN formalism, and the PVS toolkit. This approach is based on the decomposition style we adopted in the development platform, i.e. a codification of how concerns are separated and how the languages are built on one another, and it covers the development process from requirement capture to code generation. A literature survey shows that there are several articles addressing the problem of checking consistency in general [9, 14, 50]. In [50], Heitmeyer et al proposed a technique 1 2 Composition should not be confused with a physical containment - a variant of aggregation. Completeness in the sense that structures of objects in the system are fully described. 20 for checking consistency of requirement specifications given in the SCR (Software Cost Reduction) [51] method. They developed a suite of prototype tools, which includes a specification editor, a consistency checker, and a simulator. Other articles are specifically focusing on consistency of UML models [2, 24, 111, 84, 59]. Paige et al [84] present a formal and mechanized approach to checking consistency constraints between UML class and collaboration diagrams. Consistency constraints are formulated as a formal and machine-checkable specification so that the PVS theorem prover can be used for checking consistency and verifying the constraints. The constraints ensure, for instance, that the messages in a collaboration diagram are legal with respect to the pre- and post-conditions of the methods in a class diagram. Chiorean et al [24] present a process for checking consistency of UML models against a set of rules: methodological rules, e.g. well-formedness rules for UML models; application profiles dependent rules, e.g. web applications; and target programming language rules. The process is based on the OCL formalism for the specification of all categories of the rules. It is known as the Object Constraint Language Environment (OCLE) and is automated by the OCLE tool [23]. The rules concerning the consistency of UML models are defined at the meta-level and hence support reuse for any UML model. The approach by Krishnan [59] to checking consistency of UML models is similar to ours. UML diagrams are formally represented in terms of state predicates - boolean functions on the set of states. The approach supports translation of various UML diagrams into state predicates defined in the PVS specification language. The PVS theorem prover is used to verify consistency between various diagrams. It is claimed that the approach enables consistency checks even for partially specified diagrams, e.g. sequence diagram. 2.4.3 Refinement In a software development process, it is practically impossible, starting from a scratch, to achieve a deliverable product in a single step. Starting with a description of system requirements at a higher level of abstraction, usually received from a client with little or no knowledge about software engineering, we systematically add more details until we achieve a full implementation of a system with the intended structural and behavioral properties. The process by which an abstract model (containing little implementation detail) of the system can be incrementally transformed to a model that can readily be implemented in a specific programming language is known as refinement. While refinement in traditional textual languages involves manipulation of textual syntactic expressions, in languages with graphical syntax, like UML, refinement should be thought of diagrammatically. In other words, a refinement of UML models implies diagrammatical transformations. Moreover, because UML combines several graphical modeling techniques to describe a complete system, a complete refinement step may 21 require several graphical transformation frameworks. In a refinement process, correctness of the refined (i.e. the specialized and/or detailed) model must be verified against its abstract counterpart(s). Formal semantic definitions of UML modeling techniques can be used as foundation for developing refinement rules for UML. In UML standard document v1.3 [79], the notion of refinement is used to represent a greater level of detail. It is a kind of dependency relationship between an element that has already been specified at a certain level of detail and its refinement that includes more details. For instance, a class in analysis model may have a refined counterpart in a design model, and even more refined one in implementation model. Since the distinction between refinement and generalization is valid only in implementation models [40, 58], at higher abstraction level, the representation of generalization as subtyping in PVS-SL can capture refinement as well. For a detailed discussion about the current condition of semantics of refinement and other relationships such as generalization, realization, etc. interested reader can refer to the work by Kent et al [58]. Because refinement in UML is defined as a relationship between modeling elements and not between complete diagrams, an important open issue, as mentioned in [58], is to define refinements of complete UML diagrams. 2.4.4 Formal Reasoning Providing a formal definition for semantics of OO modeling technique is not a goal by itself. The ultimate goal of formalization is to develop a framework that supports rigorous analysis of models. Formal verification has been proposed for checking safety and liveness properties in the context of critical systems. The two well established approaches to verification are model-theoretic where a certain temporal formula is applied to the model in question, and proof-theoretic reasoning where logical deductions are used to demonstrate that a given property of the model, usually stated as a theorem, is a logical consequence of a set of axioms [76]. In reasoning about UML models, the model-theoretic approach is suitable for checking temporal properties usually modelled by sequence diagrams, whereas prooftheoretic reasoning is efficient for checking consistency of models. Our development platform supports these model analysis techniques by relying on the PVS theorem proving and model checking. Typically, a formal reasoning can be used to verify consistency between (possibly partial) system descriptions given in different UML modeling techniques (see section 3.7), or between the UML and OUN notations (refer to paper [VI] in appendix F). 22 Chapter 3 Summary of Contributions A software development method is a unified process incorporating several description techniques to characterize different aspects of a system. In a development process, a software system goes through several phases, from requirement capture, to analysis, to design, to testing, and to code generation, during its life-cycle. At each stage of development, system specifications at various levels of abstraction, and focusing on different aspects of the system should be provided using suitable description techniques. To satisfy these requirements, UML [79] combines several modeling techniques and graphical notations that allow descriptions of different aspects of a system, i.e. static structural, dynamic behavioral, and administrative aspects. However, the UML diagrammatical descriptions are essentially informal and not suitable for precise analysis. The contemporary UML standard document (v1.3) [79] provides semantics of UML modeling techniques in a natural language, namely, the English language. There are now numerous attempts at giving a formal semantics to fragments of UML using different approaches. Some replace informal object-oriented (OO) notations with more formal ones; some extend novel or existing formal notations with OO features. These approaches are neither user friendly nor easily scalable, mainly due to the esoteric nature of formal methods and the lack of CASE tools. A more workable approach, adapted in this work, integrates OO modeling notations with suitable formal specification languages (see Section 2.2). We chose the PVS specification language [81] as underlying semantic foundation. The choice of PVS environment as semantic domain is dictated by its capacity to provide a very general semantic foundation, a highly expressive specification language, and powerful mechanisms for rigorous model analysis, and a strong tool support. The benefits of using the PVS environment also includes facilities to describe invariant conditions that need to be maintained, and the availability of mechanized theorem-prover, and model-checker integrated with the specification language. In this chapter, a brief summary of the work done towards developing precise semantic definition for a subset of UML modeling techniques, namely, the class diagrams, 23 sequence diagrams, and statecharts, by transforming them into semantic models within Prototype Verification System (PVS) [83, 81, 82] is presented. Remark 3.1 The versions of the papers included in the sequel are revised versions of the published ones. The revisions consist of reformatting to fit them into the layout of the thesis, slight changes in contents, and corrections of typo errors. 3.1 Formal Development of Distributed Systems The need for modeling dynamically reconfigurable and extendible distributed applications has made the dynamic features of object-oriented programming languages a very popular area of research. We argue that there is no single specification technique or method, at least known to us, that has the capacity to describe all aspects of the contemporary distributed application, such as openness, dynamic reconfigurability, and extendability. The focus of paper [I] is integration of semi-formal modeling notations and formal specification languages into a single framework. It presents an approach towards providing industrially applicable framework for formal development of open distributed systems (ODS). A multi-formalism approach to formal development of ODSs is proposed: existing development techniques, are adapted, extended, and integrated to cover different aspects of software development process from requirement capture to code production. In this regard, we decided to integrate the Unified Modeling language (UML) [79] and the Oslo University Notation (OUN) [80] using the PVS specification language as a common underlying semantic foundation. UML is a graphical and object-oriented industry standard modeling language that is easy to learn and use. UML supports modularization, structuring, reusability, dynamic and multiple classification. In UML, unlike in most OO languages, objects are typed dynamically and there is a complete separation between specifications given as interfaces and their implementations by classes. These are among the main features that make UML suitable for description of ODSs. Despite the above benefits, UML suffers from several limitations in the context of formal system development. Firstly, its graphical modeling constructs are not sufficient to achieve complete and precise system description of systems. For instance, invariants and constraints on classes and types, abstract definition of operations and attributes cannot be described precisely. Secondly, since semantics of UML constructs are informally provided, in a natural language, rigorous analysis is not supported. The first deficiency can be compensated for by using UML in combination with more expressive notation like the OUN. OUN is a formal specification language that takes into account limitations of traditional formalisms by addressing major issues related to development of ODSs. It supports dynamic typing by allowing addition and removal of classes and 24 interfaces from a specification. In OUN, objects are specified by means of invariants on historic information - finite or infinite traces of parameterized events that describe interactions between the objects and their environments. The second deficiency, i.e. the lack of formal semantics definition for the UML constructs is addressed by transforming semantic notions of UML modeling techniques into the PVS specification language [4, 5, 5, 6]. Implementation of the integrated development framework proposed in [I] raises the following research issues among others: - formal semantics of the notations of UML and OUN need to be provided in PVS specification language. The work published in [4, 5, 6] and summarized in section 3.2- 3.4 below deals with formalization of semantics of UML modeling constructs. Semantics definition for the OUN notations in PVS is proposed by Johnsen [55]. - interaction between several specification languages, namely the UML and the OUN, give rise to a number of consistency issues. This problem is the theme of our work reported in [106] and summarized below in Section 3.5. - refinement proof rules should be defined. This issue is among the research topics to be addressed in the future. A CASE tool that supports integrated development framework is crucial for the application of the framework in industrial settings. We developed a prototype of a platform that integrates a UML CASE tool - the Rational Rose [27], the OUN tool, and PVS tools. The purpose is to combine the benefits of CASE tools for graphical modeling with the benefits of the PVS analysis tools in a single platform. The platform is intended to support automatic transformation of graphical models into formal semantic models, and rigorous analysis of the models using the PVS verification tools. In paper [II] we illustrate practical application of the development framework we proposed and the supporting tool by presenting a case study of the IEEE 1394 tree identify protocol. The development platform is used to specify and verify properties of the IEEE 1394 tree identify protocol. The UML modeling techniques are used for system specification, whereas complementary semantic properties are captured by using the OCL expressions. The UML models and the OCL expressions are translated into PVS specifications to verify properties using the PVS proof system. In paper [IX] the practical usability of the formal development framework and the supporting tool is demonstrated by presenting an example of the development of a critical system – a banking system. We discuss how the major components of the development framework, e.g. the semantic definitions for the UML notations, the formal V&V strategies, the PrUDE tool can be used in formal system development. We argue that the proposed framework contributes to improvement of the use of formal methods in the development of highly dependable systems in the industrial settings. 25 3.2 Semantics of Structural UML Models The focus of the work reported in paper [III] is the formalization of the UML structural description techniques. Formal semantic definitions for basic elements of UML class diagrams are proposed, and well-formedness rules for the graphical models and invariants that have to be maintained are formally expressed and argued about their correctness. In UML, static structural models of a system are described by class diagrams, and object diagrams. UML class diagrams are the most stable and widely used part of UML, since they translate in a straightforward way into implementation classes [100]. A UML class diagram consists of a set of basic modeling constructs such as classes that describe the data structure of objects that may exist in the system, and relationships between the classes (strictly speaking, between objects of the classes). A class specifies attributes and operations of a set of objects that share structural and behavioral properties. Relationships that may exist among objects are associations, aggregation, generalization, etc. that are used to classify objects, and therefore simplify the overall structural representation of system design. The structure of UML class diagrams implies that, we need to have reference semantics for an adequate description, otherwise it would not be possible to express relationships between classifiers properly. The objective of the work reported in [II] was to provide formal semantic definitions for structural UML modeling techniques, and propose a mechanism for rigorous reasoning about static structural properties of models. This is achieved through the following steps: - basic semantic concepts and modeling constructs such as classes, interfaces, and relationships are encoded into the PVS specification language. Conditions that need to be fulfilled for syntactic correctness of each modeling construct, i.e. criteria for the well-formedness of diagrammatic modeling elements, are also described in the PVS specification language. - semantics of system models described by UML class diagrams is defined in terms of the basic entities represented in the PVS specification language. Well-formedness rules, required properties of the models are specified and rigorously analyzed. The transformation also allows precise description and proof of system-specific properties by invoking the PVS theorem-prover. For instance, a class is encoded as a record type whose fields capture signatures of attributes and operations of the class. A relationship is specified as a relation, i.e. set of ordered pairs, on classifiers involved in the relationship. An association, for example, is a relation on association ends - the ends to which a classifier, its role, and multiplicity is attached. Then, a class diagram is defined as a PVS theory that consists of specification of a set of classifiers, and set of relationships. Well-formedness rules 26 for class diagrams are obtained from the conjunction of well-formedness rules for its components and some additional global requirements such as uniqueness of identifiers across the model. Transforming UML class diagrams into PVS specifications enables us to precisely express and reason about static behavior of the system specified by the class diagram. The formalization framework captures object-oriented notions such as polymorphism, inheritance, and encapsulation, and preserves the structure of models as much as possible. The integration approach reveals ambiguities that may not have been detected directly from the graphical UML models while preserving simplicity of OO modeling techniques. Transformation of a graphical UML model of a real world size system into PVS, may involve processing of a large quantity of software artifacts. Hence, a mechanized tool support is necessary. In this regard, a multi-formalism platform [104] that integrates a UML CASE-tool, the Rational Rose [27], and the PVS tool set [96, 95, 82] is developed to automate the transformation and model analysis. This supports formal development cycle of distributed systems from requirement capture to final code production. 3.3 Semantics of UML Sequence Diagrams The work reported in [IV] focuses on formal semantics of a behavioral UML description technique, namely the sequence diagram. UML sequence diagram [79] is a variant of the classical Message Sequence Charts (MSCs) [53, 25]. MSCs are graphical modeling notations for describing interaction among system components, for example in specifications of telecommunication systems. It is a well accepted description technique incorporated into a number of practical modeling languages, including UML. A dynamic model of a system describes valid changes in system states and conditions under which a change in state may occur. Interactions among system components are captured by modeling occurrences of events such as message sending, receiving, invocation of operation, etc. The UML sequence diagram is among the dynamic models used to specify dynamic system behavior. A sequence diagram makes time ordering of interactions explicit, yet hides structural relationships among the objects participating in the interaction. A sequence diagram describes either a single execution thread or a procedural view of all allowable decision paths available for execution. In the former case, a sequence diagram models a scenario, whereas in the latter case it models a use case. A single sequence diagram describes a segment of interaction, and provides only a partial specification of a system. To obtain a complete specification of the system, it would be necessary to use a collection of sequence diagrams complemented with other models such as class diagrams and statechart diagrams. When several UML modeling techniques are used in combination, the validity and consistency of the resulting system 27 model must be taken care of since such a combination of partial specifications given in different description techniques may introduce inconsistency. To address consistency issues and to undertake model analysis, the development process should be augmented with rigorous analysis technique which in turn requires formal semantic definitions for the modeling constructs. In this regard, we provide semantic definitions for UML sequence diagrams by expressing them in the PVS specification language. A sequence diagram models interactions among objects that exist in a system and/or between the system and its environment. An interaction involves message communications which in turn involves event occurrences. A message communication is a pair of event occurrences: a message send, and a message receive events. The semantic of a sequence diagram is defined as a set of traces of events that may occur on objects participating in the interaction specified by the sequence diagram. A trace models a single possible execution thread. Trace-by-trace projection of the set of traces representing a sequence diagram onto the alphabet of an object, i.e. events that occurs on the object, results in a representation of the behavior of the object. Semantic definition of sequence diagrams requires definitions of other semantic notions such as events, actions, objects, operations, etc., which are also provided. General requirements on sequence diagram models, e.g. causality - that a message must be sent before it is received, are stated as predicates on traces. The partial ordering of events on an object in a sequence diagrams is preserved by using sequence of events rather than multi-sets, but the later case can be derived by considering all possible sequences that give rise to a given multi-set [86]. Moreover, requirements that ensure well-formedness of sequence diagram models are also specified. A case study of a telecommunication network is presented to illustrate an integrated use of UML sequence diagrams and class diagrams in formal development of distributed systems. The case study also shows how the PVS tools can be used to perform rigorous analysis of models that are obtained by transforming UML constructs into the PVS specification language. 3.4 Semantics of UML Statecharts in PVS The work reported in [V] focuses on semantics of a behavioral UML modeling technique, namely the statecharts [79] and descriptions well-formedness properties of dynamic UML models. UML statecharts are object-oriented variant of the classical Harel statecharts [47]. The classical Statecharts are visual formalism, which can be seen as generalization of the conventional finite automata to include features such as hierarchy, orthogonality, and broadcasting communications between system components. Being a formalism, there is no unique semantics in the various implementations and further statecharts specifications can be nondeterministic [94]. One of the main differences between UML statecharts and the classical statecharts 28 is that the former specifies behavior of a type, whereas the latter specifies behavior of processes. Actually, the notion of a process is not supported by UML statecharts. Classical statecharts assume zero-time transition, but a transition may take some time in the UML statecharts. In UML, event broadcasting is not supported, but it can be simulated by sending messages to a set of identified objects. A UML statechart is associated with a specific modeling element, usually an object or an interaction, and describes complete life cycle of the element by describing its reaction to events. The association with a modeling element provides the context of the statechart. An object has both static structural and dynamic behavioral aspects. Static structural aspects of objects are described by classifiers in UML class diagrams, whereas behavioral aspects are described using dynamic models such as statechart diagrams and interaction diagrams. A typical application of statecharts is in modeling the behavior of reactive objects. A UML statechart diagram is a directed graph whose vertices are states and arcs are transitions between the states. The focus of contribution [V] is defining semantic definitions for UML statecharts. Using the PVS specification language as underlying foundation, semantics of the basic entities and concepts of UML statecharts, such as states, transitions, events, actions, and well-formedness requirements are formally defined. The semantics of UML statecharts is defined in terms of the basic semantic entities in the PVS specification language. Finally, important properties of UML statecharts are specified and proved using PVS tool support. The characteristic feature of the formalization is that UML statecharts can be effectively transformed into PVS and hence, the verification tools of PVS can be used to verify UML statecharts as well. This functionality of the transformation framework is illustrated by a case study of a data communication platform. A data server - a component in the platform - is modelled as a UML statechart. The statecharts is translated into a PVS specification. Properties and requirements on the data server are specified and can be verified using PVS tools. 3.5 Tracking Inconsistencies in Integrated Platforms The focus of paper [VI] is issues that may arise in the context of integration of semiformal languages with formal methods in the development of distributed systems, e.g. consistency within and across language boundaries. There are numerous development techniques, and notations in software engineering. Different methods have strengths and limitations with respect to aspects of software development. Some methods have formal and highly expressive specification languages that allow precise and unambiguous description of systems, yet require more effort to use them effectively due to their esoteric nature. Others have visual and intuitively 29 appealing specification notations that are easy to learn and use, and support modularization and structuring mechanisms, yet lack underlying mathematical foundation necessary for formal system development. To tackle the increasing complexity of contemporary distributed software systems, and at the same time, provide the required level of confidence in critical systems, a development method that integrates suitable methods and notations is necessary. This approach, known as method integration (see section 2.2), results in a development framework that exploits the strengths of wellestablished formal methods and modeling techniques. A major drawback of method integration approach is the cost of identifying and removing conflicts and inconsistencies that may unavoidably be introduced - one of the major sources of errors [78]. In order to improve the quality and productivity of software development process, it is necessary to identify inconsistencies and errors at earlier phases of development, where fixing them is by far cheaper than in later phases. Contribution [VI] investigates consistency issues that may arise from integration of the UML [79] and OUN [80] notations into a single development platform using the PVS environment as underlying semantics foundation. Modeling constructs of the UML and OUN notations are translated into semantic entities in the specification language of PVS [83]. Representing the involved notations in a common domain, namely the PVS specification language, reduces the problem into internal consistency. Moreover, it makes the PVS tools available for verifying system properties, e.g. consistency, that must hold in the integrated development framework. A general approach to inconsistencies across language boarders, based on semantic equivalence between constructs in the languages involved in the integrated framework is proposed. 3.6 Enhancing Structured Reviews with Model-Based Verification Article [VII] describes an approach to include model-based correctness arguments into human-based review approaches. In this way, we are in a position to automate parts of the tedious and time-consuming defect detection task. Moreover, we describe a case study we have performed to demonstrate usability of the approach. We argue that such an integration enhances the structured design reviews and improves detection of errors and deficiencies in earlier phases of development, when cost of maintenance is cheaper. We discuss a set of correctness arguments that can be used in conjunction with formal validation and verification (V&V) in order to improve the quality and reliability of critical systems in a cost-effective way. We demonstrate practical usability of the proposed approach by presenting a case study of a critical system. 30 The purpose of formalizing the semantics of object-oriented modeling techniques is to compensate for the lacking rigor necessary for model analysis and to avoid misinterpretations of models. Transforming graphical models into semantic entities in a given formalism makes the verification and validation (V&V) mechanisms of the underlying formalism readily available. CASE tool supports for the modeling techniques and formalisms can also be integrated to automate design, analysis, and V&V of the system in question. Unfortunately, not all aspects of system design and analysis can be mechanized. Hence, there is a need for systematic manual reviews to handle the aspects of V&V that cannot be automated. The level of quality obtained with conventional V&V techniques may not be sufficient for critical systems where a failure may result in significant economic losses, physical damage, or threat to human life. Achieving a high level of dependability (i.e. availability, reliability, safety and security) is usually the most important quality criteria that must be met before launching a software system. Although a better reliability can be achieved by using formal development techniques, the esoteric nature of formal methods, imposes a significant barrier on their large scale utilization. To overcome these barriers, several strategies for introducing formal methods into software development process have been proposed in the literature [44, 3, 67]. Most of the strategies integrate the strengths of formal and semi-formal methods [49, 35, 108]. For instance, in [67] a visual formalism based on tabular description is used in the first place to write the specification, whereas the verification is performed by generating automatically a PVS model based on the tables, and by invoking the PVS theorem-prover tool. Our work draws on the same principle by highlighting the major limitations of formal V&V and by compensating them with alternative strategies to facilitate their large scale utilization. We proposed an integrated V&V approach based on the concept of lightweight formal methods and structured design reviews. 3.7 Summary of Major Achievements The objective of this work is to contribute towards formal development of open distributed system by integrating strengths of semi-formal graphical modeling notations and formal methods. In this regard, several results are achieved: precise semantic definitions for a subset of UML notations; a formal development framework for open distributed systems; and a prototype of a CASE tool, which supports automation of the development framework. The rest of this section briefly summarizes the results. 3.7.1 Semantic Definitions for UML Notations Graphical UML models are informal system descriptions and not precise enough to perform rigorous analysis. There have been numerous attempts to provide formal 31 semantics to UML models either by translating them into textual formal languages [69] or by using the object constraint language (OCL) to express constraints such as invariants and pre- and post-conditions that must be satisfied [24]. The purpose of integrating semi-formal modeling techniques with formal methods (FMs) is to exploit the mathematical foundation underlying FMs to rigorously analyze and to reveal subtle errors that may not be discovered otherwise. This requires transformation of graphical models into mechanically analyzable specifications in a formal specification language, which in turn requires formal semantic definitions for the graphical modeling constructs. In this regard, we proposed semantic definitions for the UML notations [4, 5, 6, 105] using PVS as underlying semantic foundation. The resulting semantics is used as a basis of a formal development framework and a supporting CASE tool, namely, the PrUDE environment and its tool. 3.7.2 A Framework for Formal Development ODSs The lack of precise and unambiguous semantics for UML modeling constructs severely hampers its application to development of critical systems in industrial settings. Formalization of semantics of the UML modeling techniques is the central theme of this work. Ultimately, how the resulting semantic framework can be gauged towards supporting formal development of open distributed systems is explored. Because UML is a combination of several well-established modeling notations, e.g. statecharts [47], message sequence charts (MSCs) [53], both inter and intra-language consistency issues need to be addressed. Static UML models such as class diagrams describe structural properties of a system, whereas dynamic models such as statechart diagrams, and sequence diagrams capture behavior of the system. To obtain a complete description of a system, combined use of the static and dynamic models would be necessary. That is, in a software development project, several modeling notations and techniques need to be combined in order to provide complete system specification that captures important aspects at various level of abstraction in different phases of software development process. Although the order of usage of the different UML modeling approaches are rather orthogonal, it is necessary to maintain correctness and consistency across the resulting specifications. This in turn calls for a precise semantic definitions of constructs of the UML notations to facilitate rigorous analysis of individual model, i.e. to verify if the models are correct and consistent, the resulting system satisfies the requirement specifications. In formalization of notations that combines several modeling techniques, a common underlying semantic foundation is vital. Transforming the modeling notations into a single semantic domain not only significantly simplifies internal consistencies problems, but also improves verification and validation process. We have proposed the integrated framework shown in Figure 3.1 for formal development of distributed systems. 32 User requirements OUN partial spec. UML partial spec. Validation Refinement Refinement UML design model OUN design model Verification Code generation Code Figure 3.1: Formal Development Framework for ODSs - From user requirement specifications, developers provide analysis models using suitable UML notations and OUN notations based on a given decomposition style. The decomposition style determines which aspects of the system should be described using which modeling notation. This may result in two partial specifications that describe different aspects of the system. - The specification in UML notations is translated into a design model in OUN where analysis facilities are used to validate the models. It may also be necessary to translate the OUN design model back to UML, and the translation between UML and OUN models can be repeated until the developer is satisfied with the models. - The UML and OUN models are refined to obtain design models, which are transformed into semantic models in the common underlying semantic foundation, i.e. the PVS specification language, based on the proposed formal semantics for the UML and OUN notations and the transformation rules (refer to papers I-IV and 33 [55]). - The semantic models, i.e. specifications in the PVS specification language, are verified and validated using the formal reasoning facilities provided by the PVS environment. Although most of the V&V steps can be mechanically performed using PVS tools such as the theorem prover and model checker, some still require manual review (refer to paper VII). - If the V&V of the PVS specifications are successful, the corresponding UML models are valid. If it fails, assuming that the translation of the UML models are correct, the UML models must be reviewed based on the feedback from the V&V procedure. Most of the steps in the development process are iterative. For instance, if a verification discovers an error in a UML model, we need to fix it in the UML model and transform it into a semantic model. These iterative steps are depicted in Figure 3.1 by two-ways arrows. By using the above formalization approach and the proposed framework for development of distributed systems, contributes to the formal development process in the following ways: - Formally representing the graphical modeling language in the PVS specification language enables us to clarify the language and to develop precise UML models and prove their correctness. Representation of diagrammatical UML models in PVS specification language results in not only specifications amenable to rigorous analysis but also makes PVS theorem-proving and model checking readily available for validation and verification of the resulting system specification. - Model correctness properties and well-formedness rules, provided in the semiformal object constraint language (OCL) and a natural language are formally expressed. - System modeling results in descriptions of a system at higher level of abstraction leaving out details. This allows developers to focus on analysis and design of important aspects of the system which in turn may result in detection of errors and/or deficiencies at earlier phases of development. 3.7.3 CASE Tool Support Remark 3.2 The two CASE tools, namely the Integrator [104] and the PrUDE [7], are developed in connection with the works included in this thesis. I was directly involved in the development of the Integrator platform, and it is based on the semantic definitions I proposed for the UML notations. In the case of the development of the PrUDE tool, 34 however, my contribution was rather indirectly by defining formal semantics for a subset of the UML notation on which the implementation of the PrUDE tool is based. The PrUDE tool was developed at the Department of Electrical and Computer Engineering, University of Victoria, Canada, by Dr. Traoré and members of his research team. Application of the strategy to a large-scale project may involves manipulation of huge data. Thus, automation is an essential aspect of the development framework. In this regard, we have developed a prototype of a platform, called Integrator [109], which integrates formal methods with suitable existing graphical object-oriented notation(s). The graphical object-oriented notations are easy to learn and use, and in most cases they have industrial strength tool supports. Figure 3.2: A Snapshot of the Integrator Platform 35 In our case, a commercial UML CASE tool, namely the Rational Rose, the OUN tool and the PVS toolkits are systematically integrated. The UML tool is used to deals with requirement capture and code generation, whereas validation and verification are supported by the PVS toolkit such as theorem-prover, model-checker, and type-checker. The platform allows developers to deal with graphical models they have developed in UML while the formal ”stuff” is processed by the PVS tools at the back end. In this way, the formal notation is hidden behind the graphical notation, and features of the formal notations are available for rigorous reasoning. 36 Chapter 4 Conclusions and Future Work 4.1 Conclusions Semantic definitions for UML models provided informally in the current standard document are lacking the level of formality necessary to undertake rigorous analysis. Formal semantic definitions for UML modeling constructs can lead to a deeper understanding of the modeling concepts, which in turn can lead to a matured use of model analysis techniques. As argued by Evans et al [38], such insights can be gained by exploring consequences of particular interpretations, and by studying the effects of relaxing and/or tightening constraints on the semantic models. In this work, formal semantic definitions for a subset of UML modeling techniques are provided by translating them into a well-defined semantic foundation. Specifically, static structural models such as class diagrams, and the dynamic behavioral models such as sequence and statechart diagrams are considered. Our approach to the formalization of UML notations is based on the method integration strategy [42], and we integrate the UML with the specification language of PVS [81, 81, 82]. Integrating a semi-formal graphical modeling language with a formal method results in a development framework that combines the strengths of the modeling language and the formal method. For instance, the framework is easy to learn and use as it allows system developers to interact with the visual modeling notation on the front end, while rigorous analysis is carried out at the back end. Defining formal semantics of UML modeling techniques in PVS is a good starting point for developing an integrated framework for description of combined views of static and dynamic aspects of systems. The integrated framework preserves useful properties of the graphical UML notations, e.g. their intuitively appealing visual modeling constructs, whereas the PVS environment is used to reason about correctness of the models. The resulting framework facilitates translation of the UML models into machine analyzable semantic models in the PVS specification language. Moreover, it allows users to directly apply the PVS analysis techniques and tools such as the 37 type-checker, theorem-prover, and model-checker to the resulting semantic models. Developing a platform that supports automation of the integrated framework is crucial since analysis and design of software system may involve large quantity of software artifacts. This facilitates rigorous reasoning about the system in question - a support which is not available by merely using the graphical UML modeling techniques [99]. In order to realize mechanization of the framework, we have developed a prototype of a platform that integrates a commercial UML CASE tool, namely, the Rational Rose [27], the OUN [80] tool, and the PVS tools. The platform supports development of distributed systems (cf. Section 3.7) from requirement capture to code production. This work contributes to the ongoing effort to provide formal semantics definition for UML models, with the aim of clarifying and removing ambiguities from the language as well as supporting the development of semantically based tools. It is also a part of a long-term vision to explore how the PVS tool set could be used to underpin practical CASE tools for analysis of UML models. One major advantage of our framework is its capacity to utilize existing powerful well-established notations and formalisms and their respective CASE tools. This enables us to address limitations inherent in the contemporary notations, in the context of formal development of open distributed systems, by a synergy of the strengths of graphical modeling notations and formal reasoning techniques. The framework allows developers to deal with the graphical system descriptions while most of the formal ’stuff’ is manipulated at the back end. We strongly believe that masking the rigorous analysis with graphical front end improves the use of formal development techniques in the industrial settings. For a general purpose modeling language like the UML, that incorporates almost all aspects of OO programming, it is difficult, if at all possible, to find a single formalism which can capture all its semantic aspects. Most of the research works focus on formalization of semantics of a subset of UML notations using a suitable underlying semantic foundation. A major challenge facing the research community is how the formalization frameworks can be combined in order to obtain a formalization that captures all aspects of the UML notations. 4.2 Future Work The task of UML formalization is not trivial and poses many problems. It is unrealistic to try to address the whole issues of formalization of a huge modeling language like UML in a single thesis work. Our focus is to develop a generic framework for formal development of distributed systems, and supported with semantically-based CASE tools. The framework can serve as a basis for further work. Some of the main features of UML that make its formalization more difficult than formalization of ordinary computer languages are the following: heterogeneity, multiview, and extendibility. 38 - Heterogeneity - UML is a collection of heterogeneous semi-formal notations that use a variety of diagrams such as a variant of entity relationships, statecharts, message sequence charts, etc. for different purposes. - Multiview - A UML model of a system consists of many diagrams, each one describing a view of the system or some of its parts. It may happen that structural constraints on a class are specified in a class diagram, its local behavior is given in a state diagram, and interaction of its with objects another class is specified in a sequence diagram. - Extendible - UML provides mechanisms to extend its modeling elements as stereotypes, tagged values and constraints. Use of OCL to describe constraints is, for instance, not mandatory and can be replaced by other languages. - Notation - UML is a notation (or a modeling language) and not a method. It does not prescribe any particular development process. Thus, it can be used in different ways by different methods. In the future, we extend the framework to capture the features discussed above and other aspects such as patterns, etc. Providing formal semantic definitions for UML notations is a prerequisite for reasoning about refinement steps, relationships between different description techniques, and for specifying conditions that ensure the consistency of a system specification [17]. We will investigate issues such as the notion of refinement and develop refinement proof rules, and algebraic proof rules. We gauge the framework to specific application domains, especially to the domain of critical systems such as e-business and e-government with emphasis put on security requirements. In connection with the CASE tools, an issue that needs further consideration is how to communicate feedbacks from PVS toolkit back to software developer who may not be expert in formal methods. In the current version of the PrUDE tool, results from PVS toolkit are reported in plain text. It should be possible to implement an ’intelligent’ parser that can reinterpret the text from the PVS verification tools in order to indicate the component, which contains the error. This will minimize the interaction of developers with the verification tools, which improves practical usability of the CASE tool. 39 References References [1] M. Abadi and L. Lamport. An Old-fashioned Recipe for Real-Time. ACM Transactions on Programming Languages and Systems, 16(5):1543–1571, 1994. [2] P. Andre, A. Romanczuk, J.-C. Royer, and A. Vasconcelos. Checking the Consistency of UML Class Diagrams Using Larch Prover. In T. Clark, editor, Proc. of the third Rigorous ObjectOriented Methods Workshop (ROOM 3), January 2000. [3] M. Archer, C. Heitmeyer, and S. Sims. TAME: A PVS Interface to Simplify Proofs for Automata Models. In the Proc. User Interfaces for Theorem Provers, July 1998. Technical report at Eindhoven Univ. of Technology, Netherlands. [4] D. Aredo, I. Traoré, and K. Stølen. An Outline of PVS Semantics for UML Class Diagrams (extended abstract). In the Proc. of The 11th Nordic Workshop on Programming Theory NWPT’99, Uppsala, Sweden, October 6-8, 1999. [5] D. B. Aredo. A Framework for Semantics of UML Sequence Diagrams in PVS. Journal of Universal Computer Science (JUCS), Know-Center in cooperation with Springer Pub. Co., Joanneum Research and the IICM, Graz University of Technology, 8(7):674–697, July 2002. [6] D. B. Aredo. Semantics of UML Statecharts in PVS. In the Proc. of 7th World Multiconference on Systemics, Cybernetics and Informatics (SCI2003), Orlando, Florida, USA, July 27-30, 2003. [7] M. Belaid and I. Traoré. The Precise UML Development Environment (PrUDE) Reference Guide. Technical Report ECE01-2, Department of Electrical and Computer Eng., University of Victoria, April 2001. [8] B. Boehm. Industrial Software Metrics Top 10 List. IEEE Software, 4(5):84–85, September 1987. [9] E.A. Boiten, J. Derrick, H. Bowman, and M.W.A. Steen. Constructive consistency checking for partial specification in Z. Science of Computer Programming, 35(1):29–75, September 1999. [10] G. Booch. Object-Oriented Analysis and Design with Applications. Benjamin Cummings, Redwood City, California, 1st edition, 1991. [11] G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison Wesley Longman Inc, Reading Massachusetts 01867, 1999. [12] R. H. Bourdeau and B. H.C. Cheng. A Formal Semantics for Object Model Diagrams. IEEE Transactions on Software Engineering, 21(10):799–821, October 1995. [13] J. P. Bowen and M. G. Hinchey. Ten Commandments of Formal Methods. Technical 350, University of Cambridge Computer Laboratory, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK, September 1994. [14] H. Bowman, E. A. Boiten, J. Derrick, and M. W. A. Steen. Strategies for Consistency Checking Based on Unification. Science of Computer Programming, 33:261–298, April 1999. [15] R. Breu, R. Grosu, C. Hofmann, F. Huber, I. Kruger, B. Rumpe, M. Schmidt, and W. Schwerin. Exemplary and Complete Object Interaction Descriptions. In Haim Kilov, Bernhard Rumpe, and Ian Simmonds, editors, the Proc. of OOPSLA’97 Workshop on Object-oriented Behavioral Semantics, Atlanta, Georgia, October 1997. TUM-I9737. [16] Ruth Breu, Radu Grosu, Franz Huber, Bernhard Rumpe, and Wolfgang Schwerin. Towards a Precise Semantics for Object-Oriented Modeling Techniques. In Jan Bosch and Stuart Mitchell, editors, Object-Oriented Technology, ECOOP’97 Workshop Reader. Springer Verlag, LNCS 1357, 1997. [17] Ruth Breu, Ursula Hinkel, Christoph Hofmann, Cornel Klein, Barbara Paech, Bernhard Rumpe, and Veronika Thurner. Towards a Formalization of the Unified Modeling Language. In Mehmet Aksit and Satoshi Matsuoka, editors, ECOOP’97 – Object-Oriented Programming, 11th European Conference, volume 1241 of LNCS, pages 344–366. Springer, 1997. 40 References [18] M. Broy. On the Meaning of Message Sequence Charts. In ECOOP’97, Mehmet Aksit, Satoshi Matsuoka (ed.), volume LNCS 1241, Jyväskylä, Finland, June 1997. Springer Verlag. [19] M. Broy, F. Dederichs, M. Fuchs, T. F. Gritzner, and R. Weber. The Design of Distributed Systems - An Introduction to FUCUS, January 1993. [20] J. M. Bruel, B. Chintapally, R.B. France, and G. K. Raghavan. FuZE-Draft of the User’s Guide. Dep’t of Computer Science and Eng., Florida Atlantic University, FAU Technical Report TRCSE-96-9, 1996. [21] J.-M. Bruel and Robert B. France. Transforming UML Models to Formal Specifications. In the Proc. of the OOPSLA’98 Workshop on Formalizing UML. Why? How?, Vancouver, Canada, October 1998. [22] P. Chen. The Entity-Relationship Model - Toward a Unified View of Data. ACM Transactions on Database Systems, 1(1):9–36, 1976. [23] D. Chiorean, M. Pasca, A. Carcu, C. Botiza, S. Moldovan M. Bortes, H. Chiorean, I. Ciupa, and D. Corutiu. The OCLE Tool, December 2003. [24] D. Chiorean, M. Pasca, A. Carcu, C. Botiza, and S. Moldovan. Ensuring UML Models Consistency Using the OCL Environment. In Proc. of UML 2003 Workshop on OCL 2.0 - Industry Standard or Scientific Playground?, San Francisco, USA, October 21, 2003. [25] J.M.H. Cobben, A. Engels, S. Mauw, and M.A. Reniers. Annex B to Recommendation Z.120: Algebraic Semantics of Message Sequence Chart (MSC), 1995. [26] D. Coleman, P. Arnold, S. Bodoff, C. Dollin, H. Gilchrist, and P. Jeremaes. Object-Oriented Development: The Fusion Method. Prentice Hall, 1994. [27] Rational Software Corporation. Rational Rose 98, 1998. Available at www.rational.com/products/rose/index.jtmpl. [28] G. Coulouris, J. Dollimore, and T. Kindberg. Distributed Systems: Concepts and Design. Addison-Wesley, Essex, CM20 2JE, England, 2nd edition, 1994. [29] O.-J. Dahl and O. Owe. Formal Methods and the RM-ODP. Research report No. 261, March 1998. Department of Informatics, University of Oslo, Norway. [30] W. Damm and D. Harel. LSC’s: Breathing Life into Message Sequence Charts. In Formal Methods for Open Distributed Systems (FMOODS’99), Florence, Italy, February 15-18, 1999. [31] B. P. Douglas. Uml statecharts. Embedded Systems Programing (ESP), 12(1), January 1999. [32] D. Duke. Object-Oriented Formal Specification. PhD thesis, University of Queensland, 1991. [33] E.H. Dürr and N. Plat. VDM++ Language Reference Manual. Afrodite (ESPRIT-III project) document AFRO/CG/ED/LRM/V10, cap Volmac, 1995. [34] B. Dutertre and S. Schneider. Embedding CSP in PVS: An Application to Authentication Protocols. In Theorem Proving in Higher Order Logics: 10th International Conference, TPHOLs ’97, volume 1275 of Lecture Notes in Computer Science, pages 121–136, Murray Hill, NJ, August 1997. Springer-Verlag. [35] S. Easterbrook, R. Lutz, R. Covington, J. Kelly, Y. Ampo, and D. Hamilton. Experiences Using Lightweight Formal Methods for Requirements Modeling. IEEE Trans. on Soft. Eng., 24:4–14, Jan. 1998. [36] G. Engels, R. Heckel, and S. Sauer. UML - A Universal Modeling Language? In the Proc. of ICATPN 2000, LNCS 1825, pages 24–38, Berlin, Heidelberg, 2000. Springer-Verlag. [37] A. Evans. Reasoning with UML Class Diagrams. In the Proc. of WIFT’98. IEEE Press, 1998. [38] A. Evans and T. Clark. Foundations of the Unified Modeling Language. In the Proc. of the 2nd BCS-FACS Northern Formal Methods Workshop, Ilkley, UK, 23-24 September, 1997. [39] A. Evans, R. B. France, K. Lano, and B. Rumpe. Developing the UML as a Formal Modelling Notation. In Jean Bézivin and Pierre-Alain Muller, editors, The Unified Modeling Language, UML’98 - Beyond the Notation. First International Workshop, Mulhouse, France, pages 297– 307, June 1998. 41 References [40] M. Fowler and K. Scott. UML Distilled: Applying the Standard Object Modeling Language. Addison Wesley Longman, Inc., 1997. 11th reprinting, June 1999. [41] R. B. France, J.-M. Bruel, M. Larrondo-Petrie, and M. Shroff. Exploring the Semantics of UML Type Structures with Z. In H. Bowman and J. Derrick, editors, the Proc. 2nd IFIP Conf. Formal Methods for Open Object-Based Distributed Systems (FMOODS’97). Chapman and Hall, London, 1997. [42] R. B. France, J.-M. Bruel, and M. M. Larrondo-Petrie. An Integrated Object-Oriented and Formal Modeling Environment. Journal of Object-Oriented Programming (JOOP), 10(7), December 1997. [43] R. B. France, A. Evans, K. Lano, and B. Rumpe. The UML as a Formal Modeling Notation. Computer Standards & Interfaces, 19:325–334, 1998. [44] M. D. Fraser, K. Kunar, and V. K. Vaishnavi. Strategies for Incorporating Formal Specification in Software Development. Communications of ACM, 37(10):74–86, October 1994. [45] M. J. C. Gordon and T. F. Melham. Introduction to HOL (A theorem-proving environment for higher order logic). Cambridge University Press, 1993. [46] John V. Guttag, James J. Horning, S.J. Garland, and K.D. Jones. Larch: Languages and Tools for Formal Specification. Springer-Verlag,, 1993. [47] D. Harel, A. Penueli, J. P. Schmidt, and R. Sherman. On the Formal Semantics of Statecharts. In the Proc. of the 2nd IEEE Symposium on Logic in Computer Science, pages 54–64, New York, USA, 1987. IEEE Press. [48] David Harel and Bernhard Rumpe. Modeling Languages: Syntax, Semantics and All That Stuff - Part I: The Basic Stuff. Technical Report MCS00-16, Faculty of Mathematics and Computer Science, The Weizmann Institute of Science, Israel, September 2000. [49] M. Heimdahl and N. Leveson. Completeness and Consistency Analysis of State-Based Requirements. IEEE Trans. On Software Engineering, 22:363–377, November 1996. [50] C. L. Heitmeyer, R.D. Jeffords, and B.G. Labaw. Automated Consistency Checking of Requirements Specifications. ACM Trans. on Software Engineering and Methodology, 5(3):231–261, July 1996. [51] K. L. Heninger. Specifying Software Requirements for Complex Systems: New Techniques and their Application. IEEE Trans. on Software Eng., 6(1), January 1980. [52] C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall, 1985. [53] ITU-TS. ITU-TS Recommendation Z.120: Message Sequence Chart (MSC), 1996. [54] I. Jacobson, M. Christerson, P. Jansson, and G. Övergaard. Object-Oriented Software Engineering: A Use Case Driven Approach. Addisn-Wesley, Wokingham, England, 1992. [55] E. B. Johnsen and O. Owe. A PVS proof environment for OUN. Research report No. 295, Department of Informatics, University of Oslo, Norway, June 2001. [56] ISO-IEC JTC1/SC21/WG7. Reference Model of Open Distributed Processing (RM-ODP), 1995. [57] P. Kellomäki. Verification of reactive systems using DisCo and PVS. In Formal Methods Europe FME’97, volume 1313 of Lecture Notes in Computer Science, pages 589–604, Graz, Austria, September 1997. Springer-Verlag. [58] S. Kent, A. Evans, and B. Rumpe. UML Semantics FAQ. In ECOOP’99 Workshop Reader. Springer Verlag, LNCS, December 1999. [59] P. Krishnan. Consistency Checks for UML. In Proc. of the Asia Pacific Software Engineering Conference (APSEC 2000), pages 162–169, December 2000. [60] P. B. Ladkin and S. Leue. What Do Message Sequence Charts Mean? In R.L. Tenney, P.D. Amer, and M.U. Uyar, editors, Formal Description Techniques VI, IFIP Transactions C, Proceedings of the 6th International Conference on Formal Description Techniques, North-Holland, Amsterdam, 1994. 42 References [61] P.B. Ladkin and S. Leue. Comments on a Proposed Semantics for Basic Message Sequence Charts. The Computer Journal, 37(9):814–15, January 1995. [62] P.B. Ladkin and S. Leue. Four Issues Concerning the Semantics of Message Flow Graphs. In D. Hogrefe and S. Leue, editors, Formal Description Techniques VII, Proc. of the Seventh IFIP International Conference on Formal Description Techniques FORTE’94. Chapman & Hall, 1995. [63] K. Lano and H. Haughton. The Z++ Manual. Technical Report, Imperial College, London, 1994. [64] Kevin Lano and Juan Bicarregui. Formalising the UML in Structured Temporal Theories. In Haim Kilov and Bernhard Rumpe, editors, the Proc. Second ECOOP Workshop on Precise Behavioral Semantics (with an Emphasis on OO Business Specifications), pages 105–121. Technische Universität München, TUM-I9813, 1998. [65] D. Latella, I. Majzik, and M. Massink. Automatic Verification of a Behavioural Subset of UML Statechart Diagrams Using the SPIN Model-checker. Formal Aspects of Computing, 11(6):637– 664, 1999. [66] D. Latella, I. Majzik, and M. Massink. Towards a Formal Operational Semantics of UML Statechart Diagrams. In the Proc. of FMOODS’99, Florence, Italy. Kluwer, February 15-18, 1999. [67] M. Lawford, P. Froebel, and G. Moum. Practical Application of Functional and Relational Methods for the Specification and Verification of Safety Critical Software. In T. Rus, editor, the Proc. of Algebraic Methodology and Software Technology, 8th International Conference, AMAST 2000, Iowa City, Iowa, USA, May 2000, volume 1816 of Lecture Notes in Computer Science, pages 73–88. Springer, 2000. [68] Xuandong Li and Johan Lilius. Checking Compositions of UML Sequence Diagrams for Timing Inconsistency. In the Proc. of 7th Asia Pacific Software Engineering Conference (APSEC 2000). IEEE Computer Society, 2000. [69] J. Lilius and I. P. Paltor. Formalizing UML State Machines for Modeling Checking. In the Proc. of UML1999 - The Unified Modeling Language Beyond the Standard, volume LNCS 1723, 1999. [70] S. Mauw. The formalization of Message Sequence Charts. Computer Networks and ISDN Systems, 28(12):1643–1657, 1996. [71] S. Mauw and M. A. Reniers. Formalization of Static Requirements for Message sequence Charts, 1994. Joint rapporteurs meeting SG10. [72] S. Mauw and M.A. Reniers. An algebraic semantics of Basic Message Sequence Charts. The computer journal, 37(4):269–277, 1994. [73] E. Mikk, Y. Lakhnech, and M. Siegel. Hierarchical Automata as Model for Statecharts. In K. Ueda R. K. Shyamasundar, editor, the Proc. of Asian Computing Science Conference (ASIAN’97), volume 1345 of LNCS, pages 181–196. Springer Verlag, December 9-11, 1997. [74] A. Evans (moderator), S. Cook, S. Mellor, J. Warmer, and A. Wills. Advanced Methods and Tools for a Precise UML (panel paper). In the Proc. of 2nd International Conference on the Unified Modeling Language, LNCS 1723, Colorado, USA, LNCS 1723, 1999. [75] A. Moreira and R. Clark. Combining Object-oriented Analysis and Formal Description Techniques. In the Proc. of ECCOP’94, LNCS, volume 821, Bologna, Italy, 1994. Springer-Verlag. [76] Darmalingum Muthiayen. Real-Time Reactive System Development – A Formal Approach Based on UML and PVS. PhD thesis, Department of Computer Science at Concordia University, Montreal, Canada, January 2000. [77] NASA. Formal Methods Specification and Analysis Guide book for the Verification of Software and Computer Systems: A Practitioner’s Companion. Technical report, NASA, Washington, DC 20546, May 1997. Report No. NASA-GB-001-97. [78] B. Nuseibeh, J. Kramer, and A. Finkelstein. A Framework for Expressing The Relationships between Multiple Views in Requirement Specification. IEEE Trans. On Soft. Eng., 20(10):760– 773, October 1994. 43 References [79] OMG. OMG Unified Modeling Language Specification, version 1.3, June 1999. OMG standard. [80] O. Owe and I. Ryl. The Oslo University Notation: A Formalism for Open, Object-Oriented, Distributed Systems. Report No. 270, August 1999. Department of Informatics, University of Oslo, Norway. [81] S. Owre, J. Rushby, N. Shankar, and F.V. Henke. Formal Verification for Fault-tolerant Architectures: Prolegomena to the design of PVS. IEEE Transactions On Software Engineering, 21(2):107–125, February 1995. [82] S. Owre, N. Shankar, J. Rushby, and D. W. Stringer-Calvert. PVS System Guide, version 2.3. Computer Science Laboratory, SRI International, Melon Park, CA, September 1999. [83] S. Owre, N. shankar, and J. M. Rushby. The PVS Specification Language, April 1993. Computer Science Lab., SRI International. [84] R. F. Paige, J. S. Ostroff, and P. J. Brooke. Checking the Consistency of Collaboration and Class Diagrams using PVS. In Proc. of Fourth Workshop on Rigorous Object-Oriented Methods (ROOM4), British Computer Society, London, U.K., March 2002. [85] pUML. The Precise UML Group (pUML) WWW page, http://www.cs.york.ac.uk/puml/. 2001. URL address [86] G. Reggio, E. Astesiano, C. Choppy, and H. Hussmann. Analysing UML Active Classes and Associated State Machines – A Lightweight Formal Approach. In Tom Maibaum, editor, the Proc. Fundamental Approaches to Software Engineering (FASE 2000), Berlin, Germany, volume 1783 of LNCS. Springer, 2000. [87] Mark Richters. The UML Bibliography, 2001. URL address http://www.db.informatik.unibremen.de/umlbib/. [88] J. Rumbaugh. OMT Insights: Perspectives on Modeling. SIGS Books, New York, October 1996. [89] J. Rumbaugh and M. Blaha. Tutorial Notes: Object-Oriented Modeling and Design. In the Proc. of OOPSLA’91 Conference, Phoenix, Arizona, October 1991. [90] J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, and W. Lorensen. Object-Oriented Modeling and Design. Prentice Hall, Englewood Cliffs., N.J., 1991. [91] J. Rumbaugh, I. Jacobson, and G. Booch. The Umified Modeling Language, Reference Manual. Addison Wesley Longman Inc., 1999. [92] Bernhard Rumpe. A Note on Semantics (with an Emphasis on UML). In Haim Kilov and Bernhard Rumpe, editors, the Proc. of 2nd ECOOP Workshop on Precise Behavioral Semantics, pages 177–197. Technische Universit”at M”unchen, TUM-I9813, 1998. [93] J. Rushby. Specification, proof checking, and model checking for protocols and distributed systems with PVS. In FORTE X/PSTV XVII ’97: Formal Description Techniques and Protocol Specification, Testing and Verification, November 1997. [94] S. A. Seshia, R. K. Shyamasundar, A. K. Bhattacharjee, and S. D. Dhodapkar. A Translation of Statecharts to Esterel. In the Proc. of FM’99 – Formal Mthods Volume II, Toulouse, France, volume 1708 of LNCS, pages 983–1007, Berlin, Germany, September 20-24, 1999. SpringerVerlag. [95] N. Shankar, S. Owre, and J. Rushby. The PVS Prover-checker: A Reference Manual, April 1993. [96] N. Shankar, S. Owre, J. Rushby, and D. W. Stringer-Calvert. PVS Prover Guide, September 1999. Available at http://pvs.csl.sri.com/manuals.html. [97] N. Shankar and Sam Owre. Principles and pragmatics of subtyping in PVS. In Recent Trends in Algebraic Development Techniques, WADT ’99, volume 1827 LNCS, pages 37–52, Toulouse, France, September 1999. Springer-Verlag. [98] S. Shlaer and S. Mellor. Object-oriented Systems Analysis: Modeling the World in Data. Yourdon Press Computing Series, Prentice Hall, Englewood Cliffs, NJ, 1991. 44 References [99] M. Shroff and R. B. France. Towards a formalization of UML Class Structures in Z. In the Proc. of the COMPSAC’97, 1997. [100] A. J. H. Simons and I. Graham. 30 Things that go wrong in object modelling with UML 1.3, chapter 17, pages 237–257. Kluwer Academic Publishers, behavioral specifications of businesses and systems eds. edition, 1999. [101] J. M. Spivey. The Z Notation: A Reference Manual. Prentice-Hall International, 2nd edition, 1992. [102] K. Stølen. A Comparison of Eleven Specification Languages. Technical Report HWR-523, OECD Halden Reactor Project, Halden, Norway, March 1998. [103] K. Stølen, T.W. Karlsen, P. Mohn, and H. Sandmark. Using CASE Tools on Formal Methods on Real-life Software Development of Distributed Systems. Technical Report HWR-522, OECD Halden Reactor Project, IFE Halden, Norway, March 1998. [104] I. Traoré. The UML Specification of the Integrator. Research report No. 275, August 1999. Department of Informatics, University of Oslo, Norway. [105] I. Traoré. An Outline of PVS Semantics for UML Statecharts. Jounal of Universal Computer Science, 6(11):1088–1108, 2000. [106] I. Traoré, D. B. Aredo, and K. Stølen. Tracking Inconsistencies in an Integrated Platform. Research report No. 274, August 1999. Department of Informatics, University of Oslo, Norway. [107] I. Traoré, D. B. Aredo, and H. Ye. An Integrated Framework for Formal Development of Distributed Systems. Journal of Information and Software Technology, Elsevier Science, 46(5):281– 286, April 2004. [108] I. Traoré, A. Jeffroy, M. Romdhani, and A.E.K. Sahraoui. An Experience with a Multiformalism Specification of an Avionics System. In the Proc. INCOSE 98, Vancouver, Canada, July 25-31, 1998. [109] I. Traoré and K. Stølen. Towards the Definition of a Platform supporting the Formal Development of Open Distributed Systems. Research report No. 271, April 1999. Department of Informatics, University of Oslo, Norway. [110] J. J. P. Tsai, Y. Bi, S. J. H. Yang, and R. A. W. Smith. Distributed Real-Time Systems: Monitering, Visualization, Debugging and Analysis. John Weley & Sons, 605 Third Avenue, New York, USA, 1996. [111] A. Tsiolakis. Semantic Analysis and Consistency Checking of UML Sequence Diagrams. Technical Report 2001-06, Technische Universität Berlin, Department of Computer Science, April 2001. [112] J. B. Warmer and A. G. Kleppe. The Object Constraint Language: Precise Modeling with UML. Addison Wesley Longman Inc., 1999. [113] J. M. Wing. A Specifier’s Introduction to Formal Methods. IEEE Computer, 23:8–24, September 1990. 45 References 46 Appendix A Formal Development of Open Distributed Systems: Towards an Integrated Framework I. Traoré, D. B. Aredo and K. Stølen Publication: I. Traoré, D. B. Aredo and K. Stølen: Formal Development of Open Distributed Systems: Towards an Integrated Framework, in the Proc. of Workshop on Object-Oriented Specification Techniques for Distributed Systems and Behaviors (OOSDS’99), September 1999, Paris, France. Formal Development of Open Distributed Systems: Towards an Integrated Framework Issa Traoré, Demissie Aredo and Ketil Stølen Department of Informatics, University of Oslo P. O. Box 1080 Blindern, N-0316 Oslo, Norway Abstract This paper contributes to the discussion on issues related to the formal development of open distributed systems. The deficiencies of traditional formal notations in this setting are highlighted. We argue that there is no single formalism exhibiting all the features required. As a solution, we propose a multi-formalism platform that involves three formalisms: UML, OUN and PVS-SL. We discuss the motivation for the choice of these formalisms and the main research issues underlying this kind of platform. Keywords: Formal Methods, Open Distributed Systems, UML, PVS, OUN, Multiformalism, Object-orientation 1 Introduction and Problem Statement Motivated by the need for modeling the dynamic features of object-oriented programming languages and openness in distributed applications, the study of open, dynamically extendable systems has become a very popular research area. In fact, since the late 80s, much research within theoretical computer science has been directed towards this kind of systems. The emphasis has mainly been put on semantic issues; in particular, on how such systems should be represented faithfully and fully abstracted. This has, for example, led to the development of the Pi-calculus [14], and to new refinements of the Actor model [1]. Most of the early proposals have a strong operational flavor. More recent denotational approaches [10, 18] are rather technical, and in most cases directed towards the Pi-calculus. The above mentioned research attempts to find mathematical models suitable to describe the semantics of systems. The emphasis in our work is not on the semantics of systems, rather on the formal system development. Existing formal development 47 1. Introduction and Problem Statement methods suffer from certain limitations, which constrain their application to large scale projects, especially their esoterism is a serious obstacle. This fact is well expressed by Kneuper as follows: ”Software development is done by people, not by machines. No matter how ’good’ a development method is, it will only be successful if the developers who are to use it are willing and able to do so” [13]. Most specification techniques supporting the development of open distributed systems, such as the UML (Unified Modeling Language) [16, 3], lack the formal semantics and the various reasoning facilities underlying formal development methods. Moreover, we are not aware of any conventional formal development method that is able to fully handle the flexible, extendable and very dynamic features characterizing contemporary distributed systems. In RM-ODP [12], formal description techniques such as LOTOS [9], Z, SDL and Estelle are proposed for the specification of the various viewpoints involved. But, as pointed out by Dahl et al in [6], these languages are only partly satisfactory. For instance, we may use Z for the description of the static parts of the information viewpoint, but it is not suitable to deal with the dynamic aspects. SDL and Estelle give little support for formal reasoning. LOTOS is a flexible description technique, but in our opinion, mainly suitable for the design phase. Taking the above remarks into account, the challenge is to build a platform that exhibits capabilities: - to be grasped and used in an industrial context; this requires characteristics such as communicability and user friendliness. - to support the main aspects such as openness and dynamic reconfiguration exhibited by open distributed systems. - to produce formal specifications that are amenable to rigorous verification and validation. - existence of an efficient tool support, a prerequisite for its application to largescale systems. We are not aware of any single specification technique or method that provides all these capabilities. One obvious solution is to build-up a completely new method from scratch. However, this is extremely costly. Instead, we propose a multi-formalism approach where we adapt and combine already existing technologies. More explicitly, based on the evaluation of several existing methods and CASE-tools [20, 19], we propose a platform based on the UML and the OUN (Oslo University Notation) [17], for specification and refinement, and on the PVS-SL (Prototype Verification SystemSpecification Language) [5] for semantic foundation. The rest of the paper is organized as follows: In Section 2 we discuss the rational behind the choice of the specification formalisms underlying the platform. Then, in 48 2. Choice of Notations Underlying the Platform Sections 3 we discuss some of the main research topics involved. Finally, in Section 4 we make some concluding remarks. 2 Choice of Notations Underlying the Platform In this section, we give an overview of the involved notations and formalisms and discuss the rational behind the choice. 2.1 The Unified Modeling Language The choice of UML was dictated by the fact that it is built on an object-oriented framework and provides several capabilities such as extensibility mechanisms (e.g. stereotypes), dynamic and multiple classification, which are useful for the description of open distributed systems. In addition, UML provides an underlying methodology for specification and refinement, a graphical notation that contributes to communicability and friendliness, and very importantly, UML is an international standard for object-oriented modeling. 2.1.1 Support for open distribution Being an object-oriented approach, UML provides several capabilities such as encapsulation, data abstraction, extensibility, reusability and flexibility, which are helpful in modeling open distributed systems. Among the extensibility mechanisms, we can mention stereotypes for adding new building blocks, tagged values for creating new properties for existing constructs, and constraints for extending the semantics of a UML construct. Concerning data abstraction, there is a complete separation between specification and implementation objects. This allows us to design in terms of interfaces and to enable the evolution of the system by replacing an object by an alternative implementation. An interface is a collection of operations, which are used to specify service of a class or a component. A component is a physical and replaceable part of a system that conforms to and provides the realization of a set of interfaces. In most object-oriented languages, objects are statically typed, so their types are bound at their creation time. In UML, this is expressed by class diagrams. In addition, there are mechanisms for handling the dynamic nature of an object type, which can be helpful in modeling dynamic reconfiguration in the context of open distribution. This is achieved through a set of interfaces that a class may implement. An instance of such a class will support all of those interfaces, but depending on the context, it may present only one or more of them as relevant. Each of these interfaces represents a role that an object can play over time. For instance, Figure 1 is extracted from the specification of a mobile telephone system consisting of one central telephone exchange (not represented 49 2.1 The Unified Modeling Language in the figure), two switching stations S1 and S2 , and a mobile telephone T attached to a vehicle moving around. Each station covers different (possibly overlapping) areas. The telephone should always be in contact with at least one of the stations, which is at that time the base station, the other station being idle. In Figure 1, we define a class Station and its different roles by two interfaces: Base and Idlebase. In an association between the Station and Telephone classes, the Station class plays the role s1, whose type is Base; in another association Station may play another role, say as IdleBase. Dynamic typing can also be rendered through an interaction diagram, by <<interface>> Telephone Telephone activechs:Channel * t1 * t2 <<interface>> Base m ayConnect isConnectTo 1 s1:Base 1 s2:IdleBase disconnect(c:Channel) Station activechs: set[Channel] <<interface>> IdleBase connect(c:Channel) Figure 1: Dynamic Typing through Class Diagram o: Station [Base] <<become>> o: Station [IdleBase] Figure 2: Dynamic Typing through Interaction Diagram displaying the role of each instance of the corresponding class in brackets below the object’s name or by connecting each variant with a become message. For instance, in Figure 2 (extracted from a collaboration diagram describing the above mobile phone system), object o of type Station changes its role from Base to IdleBase. During the interaction, a change in an object attribute values, states, roles or relationships can also be modelled by attaching specific constraints to it, such as new, destroyed or transient to specify respectively creation, destruction and modification of the object. UML also provides several facilities for modeling distributed architecture, especially component and deployment diagrams. A deployment diagram consists of nodes, which represent the physical deployment of components; a node can be a processor or a device. We use nodes to model the topology of the hardware on which the system executes. We use component diagrams in conjunction with object diagrams and interaction diagrams 50 2.1 The Unified Modeling Language (as mentioned previously) to model mobility. For instance, Figure 3 shows a system data.db <<copy>> {location = Server S1} data.db {location = Server S2} Figure 3: Modeling Migrating Components consisting of migrating components. For load balancing purposes and failure recovery, the system consists of databases replicated across several nodes. 2.1.2 Limitations In spite of the benefits it provides, UML has several limitations in the context of the formal modeling of open distributed systems. The graphical constructs provided by UML are not enough to achieve a complete and precise specification of the system. For instance, in [7] several incompleteness in the static semantic model of UML are reported, especially concerning the definitions of the concepts of aggregation, inheritance, constraints on inheritance hierarchies and abstract operation descriptions. In order to fill this gap, there is a need for extending the capabilities of the UML with respect to two main objectives: • The description of additional constraints about the objects in the model, such as invariants on classes and types, abstract definitions of operations and attributes, non-functional requirements, etc. • The definition of a formal semantics for different constructs involved, in order to remove all ambiguities. The first objective is generally accomplished using natural language resulting in ambiguities. An alternative approach is to deal with both issues in OCL (Object Constraint Language) [16], a semi-formal constraint language easy to read and write, which is used to specify well-formedness of modeling abstractions provided by the UML. An OCL specification consists of a set of expressions without side-effects. OCL has modeling constructs for types, classes, interfaces and associations, but its expressiveness is relatively limited in the context of dynamic aspects of systems. For instance, non-query operations cannot easily be handled by OCL. Moreover, OCL is not possible to invoke processes and activate non-query operations; it is not possible to write program logic or control flow in OCL. In fact, as pointed out in [7], the semantic of OCL is not mathematically defined, and hence it does not provide the facilities required for rigorous analysis: at most, there is a set of type conformance rules. OCL is not oriented towards abstract observable system behaviors that are modelled by interfaces. 51 2.2 The Oslo University Notation Hence, instead of basing our platform on OCL, we have decided to use two other formalisms, OUN and PVS-SL, which are well-suited each for one of the two objectives mentioned earlier. 2.2 The Oslo University Notation One of our objectives in this platform is production of abstract descriptions of systems. Trace-based notations are very efficient for this purpose [11]. However, most of the existing trace-based notations don’t support object-orientation, openness and dynamic reconfiguration; thus the choice of OUN for this platform. OUN is a formal development method, which takes the deficiencies of traditional formal notations into account by addressing the main aspects of open distributed systems. Used in conjunction with UML, it can describe the invariants and constraints attached to the main constructs of UML such as types, classes and interfaces. The main properties of objects such as attributes and operations (with or without sideeffect) can be expressed in OUN. In addition, the extensibility mechanisms of UML that serves to define new UML notions match the specific needs of OUN. In contrary to OCL, OUN addresses the main implementation issues at abstract level. The major concepts considered in OUN include: Objects with internal activity and structure. Interfaces with syntactic and semantic specification of methods. Classes with state variables and imperative style implementation. Contracts used to restrict the interactions among a set of objects. Inspired by Java and CORBA, OUN considers high level object-oriented concepts, and is oriented towards practical specification, rather than operational semantics [6]. Objects are specified by means of invariants on historic information: finite or infinite sequences of parameterized events that describe interactions between the object and its environment. Consequently, only information visible outside the object, such as its signature and operation invocation, is considered. Dynamic object creation and addition of interfaces, and multiple inheritance of interfaces and classes are supported. An OUN requirement specification is provided in terms of interfaces and contracts. In contrary to UML, the concept of class appears later during design specification. An interface contains only the syntactic definitions of operations. It contains also a requirement specification taking the form of assumption-guarantee, which may consist of an invariant asserting properties that each object implementing the interface should satisfy, and an assumption stating minimal contextual requirements. In contrast to UML, objects are typed by interface. This, in addition to the possibility for an object to implement several interfaces, provides facilities for dynamic typing and hence 52 2.2 The Oslo University Notation for open distribution. In the following, we give an OUN specification of a contract that specifies an interaction among objects of interfaces Base, IdleBase and Telephone defined previously for the mobile phone system. interface Base begin opr disconnect() end interface IdleBase begin opr connect() end interface Telephone begin end contract Switch(b: Base, ib: IdleBase, t : Telephone) begin inv H/t prs [connect, disconnect]∗ end The invariant states that a request for a connection (connect message) should be followed by a disconnect message. H denotes the global communication history; the projection of the history onto an object o, denoted by H/o, corresponds to the sequence of method-calls involving object o since its creation. Keyword prs is an abbreviation of “prefix of regular sequence”. A class contains definitions of attributes, implementation of operations and possibly an invariant and assumptions. An abstract implementation of the class Station is given below. Operations are defined using guarded commands, an unsatisfied guard represents waiting. The with clause states that only objects of the interface mentioned in the clause may interact with objects of the class through the listed operations. Keywords ops, asm and inv are used respectively for operations, assumptions, and invariants defined in a class and an interface. class Station implements Base, IdleBase begin var activechs: Set(Channel) with Telephone ops connect(n : Channel) == true → activechs := add(activechs, n) 53 3. Integrating UML and OUN disconnect(m : Channel) == true → activechs := del(activechs, m) caller asm ... inv ... end where add and del are functions that, respectively, add and remove a given channel from the set of active channels of a telephone. In OUN, it is possible to extend a class dynamically, by adding some operations and interfaces. This is another support provided by OUN for open distribution. 3 Integrating UML and OUN 3.1 Main Research Issues The implementation of an integrated platform raises a number of research issues, among which the following can be mentioned: • identification of the interactions among the different formalisms involved, namely UML and OUN, which gives rise to a number of consistency proof rules. In [22], the authors define consistency relations that should hold between partial specifications developed using this platform. • definition of refinement proof rules. • definition of formal semantics for UML and OUN constructs in PVS specification language. Next, we discuss the last issue, namely the definition of the formal semantics of UML in PVS-SL; a discussion on the other issues can be found in [21]. 3.2 Formalising Object-oriented Models Several works have attempted to provide a mathematical basis for the concepts underlying object-oriented models. Some of these approaches consist of adapting or extending a novel or existing formal description technique with object-oriented concepts [15]. Others derive a formal specification from the semi-formal (or informal) model built with existing object-oriented notations such as UML or OMT [8]. The main problem with these approaches is the fact that the user should have to deal with a certain amount of formal artifacts, and as we have already argued, this can be a barrier to an industrial use. 54 3.2 Formalising Object-oriented Models A third approach, that has been adopted in this platform, consists of assigning a formal semantics to an existing object-oriented notation [7]. In this case, the formal “stuff” is hidden behind the graphical notation, and the user deals with the graphical model, while the formal stuff is processed automatically at the back-end. In [24], a formal language L is represented as a triple (SynL , SemL , R), where SynL is a notation (the syntactic domain), SemL is a set of objects (the semantic domain), and R is a relation between them: R ⊆ SynL × SemL . R is based on precise rules that define which objects satisfy each specification. Hence, since we use the notations provided by UML and OUN, and assign to them a formal semantic in PVS-SL, we define our satisfaction relation accordingly: R ⊆ SynU M L,OU N × P V S − SL For instance, in the case of UML class diagram components, the main semantic entities involved are the notions of types, and relation concepts. A class and an interface are both defined as record types that provides their specific data type definition. An interface is defined as a record type whose fields are the signatures of its operations. A class theory defines a record type whose set of fields includes the declaration of the attributes and signatures of the operations. If the class (or interface) is a subclass in some generalization relationships, then the record should include all the attributes and operations inherited. The record representing a class or interface is extended by one field for each of its super class or interface. These representations make the superclass/interface explicit. The record may also include the operations defined in the interfaces implemented by the class. Objects are defined as instances of the record type defined. A general scheme of a theory where a record type that represents a meta-class, (i.e. its instances are classes) is represented as follows: Classifiers : THEORY BEGIN Expression: TYPE ; VisibilityKind: TYPE = {public,protected,private} Attribute : TYPE = [# name : string, visibility : VisibilityKind, initialValue : Expression #] Operation : TYPE : [# name : string, visibility : VisibilityKind, spec : string #] Interface : TYPE = [# name : string, operations : setof[Operation] #] Class : TYPE = [# name : string, attributes : setof[Attributes], operations : setof[Operation] #] Classifier : TYPE = union(Interface, Class) END Classifiers 55 3.2 Formalising Object-oriented Models The fields attributes and operations specify, respectively, the set of attributes and operation locally declared in the class. If a class is a specialization of another class, e.g. SupName, then the record type contains additional field asSupName that captures the structure and behavior inherited from the superclass. A similar approach is used for a class that realizes an interface. An association is a relationship that involves two or more classifiers. In the sequel, however, we consider only binary associations and represent them as a (ordered) pairs of association ends. An association end is a model element that specifies an endpoint of an association, which connects the association to a classifier. It is defined as a record type that defines a set of properties such as the classifier, the role of the classifier, and its multiplicity. Formal representations of an Association is given as a ordered pair of AssociationEnd in the direction of navigation. Because, we consider only binary associations, the well-formedness requirement that constrains an association to have at least two association ends is fulfilled. We assume that every association is navigable. A bidirectional association is modelled as two directed associations, one in each direction. Associations : THEORY BEGIN IMPORTING Classifiers Aggregation : TYPE = {none, aggregate, composite} AssociationEnd : TYPE = [# name : aggregation : classifier : role : multipilicity: Association:TYPE = [# name : connection : END Associations string, Aggregation, Classifier, string, setof[nat] #] string, [AssociationEnd, AssociationEnd] #] In order to formally represent a class diagram, we put everything together by importing the respective theories of its components, instantiating elements that exist in the class diagram, and defining necessary constraints and invariants upon them. For instance, in the following theory, we represent the class diagram shown in Figure 1; let’s call it MobilePhoneSystem. Assume that the classifiers Telephone, Station, etc. are defined. M obileP honeSystem : THEORY BEGIN IMPORTING Telephone, Station, Base, IdleBase s : VAR Station; t : VAR Telephone PhoneEnd1 : AssociationEnd = (# name aggregation 56 := "phoneEnd", := none, 3.2 Formalising Object-oriented Models classifier := Telephone, role := "t1", multipilicity:= nat #) PhoneEnd2 : AssociationEnd = (# name := aggregation := classifier := role := multipilicity:= "phoneEnd2", none, Telephone, "t2", nat #) BaseEnd : AssociationEnd = (# name := aggregation := classifier := role := multipilicity:= "BaseEnd", none, Station, "s1", {1} #) IdleEnd : AssociationEnd = (# name := aggregation := classifier := role := multipilicity:= "IdleEnd", none, Station, "s2", {1} #) isConnectedTo: Association= (# name := "isconnectedTo", connection := <PhoneEnd1, BaseEnd> #) mayConnected : Association = (# name := "mayConnected", connection := <PhoneEnd2, IdleEnd> #) ae1, ae2 : VAR AssociationEnd; c1, c2 : VAR Classifier ass : VAR Association linked(c1,c2,ass): bool= ∃ ae1, ae2: (classifier(ae1) = c1 ∧ classifier(ae2) = c2 ∧ connection(ass) = (ae1,ae2)) axiom1: AXIOM (FORALL s, t : NOT (linked(s,t,isConnectedTo) AND linked(s,t,mayConnect))) axiom2: AXIOM (∀ t, ∃ s: ... END M obileP honeSystem linked(s,t,isConnectedTo)) A class diagram theory imports all theories that contain definitions of the classifiers existing in the class diagram, and at the same time, defines associations between them as instances of the specification given in the Association theory. 57 References Another important part of this theory is the definition of conjectures. These conjectures are defined by the user, and recorded in the main theory for validation purpose. Hence, they are not processed in the same way as the other PVS data, which are processed automatically and considered as the semantics. That represents the kind of facts and properties that can be verified using our platform. For instance, conjecture1 verifies that a station object and a telephone object are either connected or disconnected, but not both at the same time. Conjecture2 ensures that a telephone is permanently connected to a station etc. More about the formal semantics of UML into PVS-Sl can be found in [2]. 4 Concluding Remarks One of the main objectives of our platform is to minimize the formal “stuff” the user of the platform should have to deal with. This in turn facilitates its industrial use. The OUN model, which is provided as a complement to the UML model, is concerned with specific aspects with reduced complexity, and hence easy to express. In this respect, we have decided to use PVS-SL in this platform, as semantics foundation and not as a specification language. As a result, the user will not need to have an in-depth knowledge of the PVS formal notation and proof system. PVS-SL offers a very general semantic foundation and a set of powerful tools. It is highly expressive and offers several mechanisms for formal analysis. For instance, it is possible to express and reason about infinite traces within PVS-SL and this is important since OUN is tracebased. Compared to OCL, PVS-SL is highly expressive and provides stronger support for description of several kinds of operations. For instance, although operations can be modelled by a recursive expression in OCL, it is the responsibility of the modeler to ensure that the recursion is well-defined. In PVS-SL, however, termination of a recursive function is handled by a built-in clause, the MEASURE construct, which generates a proof obligation if termination, is doubtful. Another criteria facilitating industrial use is the automation of the platform. We are, currently, developing a supporting environment to which we refer as the Integrator. The integrator integrates existing tool supports for UML, namely Rational Rose [4] and the PVS toolkit and at the same time provides the functionalities they do not offer, in order to cover the whole development cycle from requirements capture to final code production [23]. References [1] G. Agha, I.A. Mason, S. Smith, and C. Talcott. A Foundation for Actor Computation. Journal of Functional Programming, 7:1–71, 1997. [2] D. Aredo, I. Traoré, and K. Stølen. An Outline of PVS Semantics for UML Class Diagrams (extended abstract). In the Proc. of The 11th Nordic Workshop on Programming Theory 58 References NWPT’99, Uppsala, Sweden, October 6-8, 1999. [3] G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison Wesley Longman Inc, Reading Massachusetts 01867, 1999. [4] Rational Software Corporation. Rational Rose 98, 1998. Available at www.rational.com/products/rose/index.jtmpl. [5] J. Crow, S. Owre, J. Rushby, N. Shankar, and M. Srivas. A Tutorial Introduction to PVS. In WIFT’95: Workshop on Industrial-Strength Formal Specification Techniques, Boca Raton, Florida, USA, April 1995. [6] O.-J. Dahl and O. Owe. Formal Methods and the RM-ODP. Research report No. 261, March 1998. Department of Informatics, University of Oslo, Norway. [7] A. Evans. UML class diagrams - Filling the Semantic Gap. Technical Report, 1998. York University. [8] F. Hayes and D. Coleman. Coherent Models for Object-Oriented Analysis. In the proc. of OOPSLA conference: Communications of the ACM, Phoenix, AZ, October 1991. [9] ISO. A Formal Description Technique Based on the Temporal Ordering of Observational Behavior, September 1988. ”ISO Standard 8807”. [10] L.J. Jagadeesan and R. Jagadeesan. Causality and True Concurrency: a data-flow analysis of the pi-calculus. In the Proc. of AMAST’95, pages 277–291, 1995. LNCS 936. [11] B. Jonsson. Compositional Verification of Distributed Systems. PhD thesis, Uppsala University, Sweden, 1987. [12] ISO-IEC JTC1/SC21/WG7. Reference Model of Open Distributed Processing (RM-ODP), 1995. [13] R. Kneuper. Limits of Formal Methods. Formal Aspects of Computing, 9:379–394, 1997. [14] R. Milner, J. Parrow, and D. Walker. A Calculus of Mobile Processes part I and II. Information and Computation, 100:1–77, 1992. [15] A. Moreira and R. Clark. Combining Object-oriented Analysis and Formal Description Techniques. In the Proc. of ECCOP’94, LNCS, volume 821, Bologna, Italy, 1994. Springer-Verlag. [16] OMG. OMG Unified Modeling Language Specification, version 1.3, June 1999. OMG standard. [17] O. Owe and I. Ryl. The Oslo University Notation: A Formalism for Open, Object-Oriented, Distributed Systems. Report No. 270, August 1999. Department of Informatics, University of Oslo, Norway. [18] I. Stark. A Fully Abstract Domain Model for the pi-calculus. In the Proc. of LICS’96, pages 36–42. IEEE computer Society Press, 1996. [19] K. Stølen. A Comparison of Eleven Specification Languages. Technical Report HWR-523, OECD Halden Reactor Project, Halden, Norway, March 1998. [20] K. Stølen, T.W. Karlsen, P. Mohn, and H. Sandmark. Using CASE Tools on Formal Methods on Real-life Software Development of Distributed Systems. Technical Report HWR-522, OECD Halden Reactor Project, IFE Halden, Norway, March 1998. [21] I. Traoré. The UML Specification of the Integrator. Research report No. 275, August 1999. Department of Informatics, University of Oslo, Norway. [22] I. Traoré, D. B. Aredo, and K. Stølen. Tracking Inconsistencies in an Integrated Platform. Research report No. 274, August 1999. Department of Informatics, University of Oslo, Norway. [23] I. Traoré and K. Stølen. Towards the Definition of a Platform supporting the Formal Development of Open Distributed Systems. Research report No. 271, April 1999. Department of Informatics, University of Oslo, Norway. [24] J. M. Wing. A Specifier’s Introduction to Formal Methods. IEEE Computer, 23:8–24, September 1990. 59 60 Appendix B Towards formalization of Structural UML Models in PVS D. B. Aredo, I. Traoré and K. Stølen Publication: D. B. Aredo, I. Traoré and K. Stølen: Towards formalization of Structural UML Models in PVS, Research Report No. 272, Department of Informatics, University of Oslo, August 1999. An abstract appeared in the Proc. of the 11th Nordic Workshop on Programming Theory (NWPT’99), October 6-8, 1999, Uppsala, Sweden. Towards Formalization of Structural UML Models in PVS Demissie B. Aredo, Issa Traoré, Ketil Stølen Department of Informatics, University of Oslo P. O. Box 1080 Blindern, N-0316 Oslo, Norway Institute for Energy Technology P. O. Box 173, N-1751 Halden, Norway {demissie,issat,ketils}@hrp.no Abstract The Unified Modeling Language (UML) is a language for specifying, visualizing and documenting object-oriented systems, and serves as a standard OO modeling notation. As the semantics of UML constructs is given informally, in a natural language, it is difficult to formally reason about correctness of a system design. Formal methods provide a rigor that is lacking in most of OO modeling notations in general and UML notations in particular. In this paper, we present a work done on formalization of UML class diagrams. We assign formal semantics to UML class diagrams in PVS specification language (PVS-SL) as underlying semantic foundation. Keywords: Formal methods, Semantics, UML, PVS, Object-orientation 1 Introduction Dealing with the complexity and heterogeneity of contemporary distributed systems is absolutely among the main concerns of developers of distributed systems. Powerful design mechanisms such as model structuring and re-usability, provided by objectorientation, gained considerable popularity in the software community. Standards such as RM-ODP [19], for example, advocate the use of object-oriented (OO) frameworks in the development of open distributed systems. Several OO design and analysis methodologies and notations have been proposed since the mid 1970s [26, 30]. The most recent and popular notation is the Unified Modeling Language (UML) [22], which resulted from a unification of the OMT [27], Booch [1], and Objectory [18] methods. UML became popular among the software 61 1. Introduction community mainly due to its visual, intuitively appealing graphical notations and useful structuring mechanisms. It is based on standards and has a powerful tool supports such as Rational Rose [6]. A major drawback of most object-oriented methodologies, including UML, is their limitation in the context of formal model analysis. Because their semantics is not precisely defined, they lack the mathematical basis to undertake rigorous model analysis. Several works have been undertaken to provide a mathematical basis to the concepts underlying OO models. In general, three approaches to formalization of OO modeling notations are identified: a supplemental, OO-extended formal notation, and methods integration approach [15]. In the supplemental approach, more formal constructs replace parts of the model that is expressed in an informal OO notations. The formalization work reported in [21] (using LOTOS [16], and syntropy [5]) is based on this approach. In the OO-extended formal notation approach, an existing formal notation is extended by features that handle the notion of object-orientation, thus making them more compatible with OO notations. VDM++ [10], Z++ [20], and Object-Z [9] are example of formalisms based on this approach. Although a rich body of formal systems may have resulted, such an extension often results in semantics that is more complex, and suffers from lack of supporting CASE tools [12, 4]. The main weakness of these approaches is that the developers still have to deal directly with a certain amount of formal artifacts. This is a significant barrier for whole-scale utilization of formal methods mainly because of this esoteric nature. The methods integration is a more workable approach that makes informal OO modeling concepts and notations more precise and amenable to rigorous analysis by integrating them with suitable formal specification techniques [14]. This is the most commonly used approach to formal system development and enables developers to directly manipulate graphical models they have created and don’t need to have indepth knowledge about the formal “stuff”, which is processed at the back-end [12]. The works published in [4, 15, 31] are, for instance, based on this approach. In the case of UML, an Object Constraint Language (OCL) [22] has been proposed to make models amenable to rigorous analysis. The semantics of OCL is not mathematically defined either, and hence it does not provide sufficient facility for formal reasoning [13]. We could formalize OCL and use it as a semantic basis. However, OCL is not suitable mainly due to its limitation in expressing UML modeling concepts, and the lack of strong CASE tool support. Hence, there is a strong need for formally defined semantics for UML constructs. In the sequel, the method integration approach is used to propose semantics of UML class diagrams in the PVS specification language (PVS-SL) [24, 28, 29], and this contributes towards the formalization of the UML notation. The rest of this paper is structured as follows: In Section 2, a brief overview of the formalisms involved, namely UML and PVS-SL, is presented. We also present a UML 62 2. Overview of the Formalisms class diagram that will be used as a running example throughout this paper to illustrate different concepts. In Section 3, we introduce a general framework of formalization and define a satisfaction relation from UML syntactic domain into the corresponding PVS semantic domain. In Section 4, we discuss formalization of UML class diagram in detail. Finally, in Section 5, we make some concluding remarks and discuss further research issues. 2 2.1 Overview of the Formalisms The PVS Specification Language PVS [24, 28, 29], is developed for design and analysis of formal specifications. It consists of a highly expressive specification language tightly integrated with a powerful interactive theorem-prover and exploits the synergy between them. In addition, it contains a proof-checker, which makes it possible to construct proofs interactively and to rerun them automatically after minor changes, and several other functionalities. The PVS-SL is based on a classical, typed higher-order logic and supports a richer type system than standard higher order logic and relies on an original approach to type checking [11]. The PVS type system has been augmented by predicate subtyping and dependent typing mechanisms. Subtyping simplifies type-checking and allows strong checks for consistency and invariant in a uniform manner [7]. For instance, partial functions can be accommodated in the logic of total functions by restricting their domains of definition. Subtyping, however, renders type checking undecidable, as a result of which proof obligations, known as Type Correctness Conditions (TCCs), are generated during type-checking and require users to discharge them. A great deal of TCCs can be discharged automatically, whereas more involving ones require interactive use of the theorem-prover. A specification is considered fully type-checked only when all TCCs have been proved. Specifications in PVS are organized into theories. A theory may contain type, variable, and constant declarations, definitions, axioms, and conjectures. The PVS-SL supports modularity and reuse by means of parameterized theories that make it possible to specify generic modeling elements and define constraint, usually called assumptions, in terms of the parameters. PVS-SL includes a library of an extensive set of built-in theories, called preludes, that provide several useful definitions and lemmas. The PVS type system contains basic types - boolean, integer, real, and type constructors - sets, tuples, records, functions. The record and function type constructors are extensively used in the sequel. A record is a finite list of fields of a general form R : TYPE = [# a1 : T1 , . . . , an : Tn #] where ai ’s are called accessor functions and Ti ’s are type expression. Given a record r :R, function application-like terms ai (r), rather than the conventional ’dot’ notation, are used to access the ith field of r. The structure 63 2.2 The Unified Modeling Language of tuples are similar to that of records except that the order of the fields is significant in tuples. Functions are of the general form [D1 , D2 , . . . , Dn → R] where Di ’s and R are type expressions. Given a type expression T, the type of sets of elements of T can be specified in two different forms: pred[T] and setof[T] each of which is a shorthand for [T → bool] and is predefined in the PVS preludes. The capability of PVS-SL to support definition of Abstract Data Types (ADTs) from which a theory is automatically synthesized during type-checking, and the presence of powerful decision procedures are particularly useful mechanisms for specifications of types. In this section, we presented a brief overview of the PVS environment. For more detailed description of PVS environment, the reader can refer to the system documentations [7, 24, 25]. 2.2 The Unified Modeling Language The Unified Modeling Language (UML) is based on a set of OO modeling techniques that have been standardized by the Object Management Group (OMG). It rapidly became an important industry standard for modeling software systems. The UML notation is rich and full bodied. It is comprised of two main subdivisions: notations for structural modeling elements like classes, interfaces, and static relationships among them; and notations for behavioral modeling elements like objects, messages, and state machines. In this report, we focus on formalization of structural modeling constructs, the UML Class Diagrams. A class diagram is important for modeling the static design view of a system. It depicts existence and static structure of classes, interfaces, and relationships among them. In the rest of this section we describe major elements of a class diagram. Figure 1 shows a typical UML class diagram that consists of the major modeling constructs. Class: A class is the most important component of UML class diagram. It is rendered as a rectangular box with three compartments. The top compartment contains the class name, the middle one contains a set of attributes, and the last compartment contains a set of operations. Types and initial values of attributes, and signature (except the name) of operations are all optional. In Figure 1, Person, Course are examples of classes. Interface: An interface specifies a collection of operations of a class, a component, or a subsystem without specification of the internal structure. An interface is rendered as a rectangular box with compartments and the keyword ¿Interf aceÀ, i.e. as a stereotyped class in order to expose its operations and other properties. It may also be rendered as a small circle with the identifier of the interface placed close to it. The list of operations supported by the interface is placed in the operations compartment, whereas the attributes compartment can be omitted since it is always empty. An interface can be realized by several classes and a class may realize several interfaces. 64 2.2 The Unified Modeling Language Student major Course ds en t t a 3..10 title: String credithrs: Nat open() addStud() PhdStud Person name 4 <<interface>> Addition CourseOffering addStud() open() location 0..4 Faculty teaches tenure 1 Figure 1: A UML Class Diagram e.g. Addition is an interface and is realized by the CourseOffering class. Relationships: A relationship depicts an existence of links among entities of class diagram. The following are the most common relationships. association is a relationship between classifier objects that specifies how the objects of the classifiers are related. An association is graphically rendered as a solid line connecting the classifiers involved. Though an association may, in general, involve arbitrary number of classifiers, in this paper we consider only binary associations. A role and multiplicity of objects can also be specified. The multiplicity of a classifier w.r.t a given association is a subset of the set of natural numbers that specifies the possible number of objects of the classifier that can be in association with an object of its counterpart(s). In Figure 1, for example, attends is an association between objects of the Student and CourseOffering classes. generalization is an inheritance relationship between a child and a parent class. so that objects of the child class are substitutable for objects of the parent class. In other words, the child class inherits the structure and behavior of the parent class. Generalization is denoted by a solid line with a hollow arrow head directed from the child class towards the parent class. In figure 1, there is a generalization relationship between objects of the Person and the Student classes. aggregation is a special kind of association between a whole and a part. It is denoted by a solid line with hollow diamond end pointing to the whole. Composition is a kind of aggregation, which specifies that an object of a part class can be contained in at most one object of the whole class. Composition is depicted by a 65 3. General Formalization Approach solid line with solid-filled diamond end pointing to the composite class. In Figure 1, a Course object is a composition of objects of the CourseOffering class. realization is a relationship between an interface and a class that implements the operations specified in the interface. e.g. the class CourseOffering realizes the interface Addition. A minimal requirement such as no PhD student may both teach and attend the same course cannot be expressed formally in UML. If desired these must be added as an adhoc or using the OCL. In our approach, however, such a requirement can be described precisely and specifications can be verified against them. 3 General Formalization Approach A formal specification language is described as triple < Syn, Sem, Sat > where Syn and Sem are, respectively, syntactic and semantic domains of the language, and Sat ⊆ Syn × Sem is a satisfaction relation between them [34]. For a given specification s ∈ Syn and d ∈ Sem, if Sat(s, d), we say that s is a specification of d, and d is a semantics definition of s. The satisfaction relation associates a meaning or interpretation to the syntactic elements. Semantics mappings are special cases of the Sat relation. In our case, the aim is to assign formal semantics to modeling elements of UML class diagrams in PVS-SL as semantic foundation. Thus, we consider the UML notations as syntactic domain and the corresponding set of PVS semantic entities as a semantic domain and define a satisfaction relation R as follows: R ⊆ SynU M L × SynP V S where SynU M L denotes the set of UML syntactic constructs and SemP V S denotes PVS semantic entities expressed by the PVS specification language. The general formalization process in our approach can be summarized as follows: • Every element of a UML class diagram is represented as a PVS theory. • In a theory appropriate types whose elements represent instances of the corresponding Model element in the UML class diagram are specified. Operations that manipulate the types, and requirements on the instances of the individual modeling element are specified in the theory as predicates, axioms, theorems, and conjectures. • A class diagram is represented by a theory that instantiates all elements by importing their respective theories. Global invariants and constraints that involve several elements are specified in the theory that represents the class diagram. 66 4. Formalization of UML Class Diagram The satisfaction condition for a class diagram and its corresponding theory is obtained from the conjunction of the satisfaction conditions of the elements. That is, for a given modeling element d of a class diagram and a PVS theory t that represents the element, t satisfies d if and only if R(d, t). For a UML class diagram D and a PVS theory T that represents D, T satisfies D if and only if for every element d ∈ D there is an instance of theory t in T such that R(d, t). Symbolically, R(D, T ) ⇔ (∀ d : D) : ((∃ t : T ) : t ¯ T ∧ R(d, t)) where t ¯ T denotes the fact that a theory t is instantiated in theory T either by importing mechanism or by theory abbreviation mechanism. 4 4.1 Formalization of UML Class Diagram Interfaces An interface is a description of externally visible set of operations of a class, or component. It is used for specifying services offered by the class or a component. An interface is represented by a theory, which contains, among others, a declaration of a record type whose fields specify the name of the interface, the set of operations in the interface, and a set of parent interfaces (multiple inheritance is supported in UML). The general scheme of a PVS theory that represents Interface is given as follows: Interface : THEORY BEGIN Operation : TYPE Interface : TYPE = [# interfaceID : string, oprations : setof[Operation], parents : setof[Interface] #] END Interface The Addition interface described in Figure 2 can be specified as an instance of the record type Interface as follows: Addition : Interface = (# InterfaceID := "Addition", operations := {op | op = addStud}, parents := { } #) More semantics concepts of interfaces, such as inheritance, will be discussed in the later sections. 4.2 Classes We represent a class as a PVS record type whose fields capture the structure of the class, i.e. its name, set of attributes, set of operations. As a class can be a subclass of 67 4.2 Classes <<interface>> Addition addStud() CourseOffering location open() Figure 2: Interface Realization one or more classes, and can implement several interfaces, the representation of class in PVS should include fields that capture the parent classes, and set of interfaces the class implements. Types defined in the parent classes and interface(s) can be made accessible using the IMPORTING the theory containing the declarations. A general scheme of a theory that represents class is as follows: Class : THEORY BEGIN IMPORTING Interface ClassID, Attribute : TYPE Class : TYPE = [# classID : ClassID, attributes : setof[Attribute], operations : setof[Operation], parents : setof[Class], interfaces : setof[Interface]#] END Based on the above transformation scheme, the class CourseOffering depicted in Figure 2 can be represented as shown below. The class CourseOffering realizes the interface Addition. Hence, the set of interfaces contains the interface Addition declared above as an instance of type Interface. a c o i : : : : VAR VAR VAR VAR Attribute; Class; Operation; Interface CourseOffering: location : c : open : Attribute Class Operation Class = (# classID:="CourseOffering", attributes := {a | a = location}, operations := {o | o = open}, parents := {c | false}, 68 4.3 Associations interfaces := {i | i = Addition} #) In PVS-SL, however, every identifier needs to be typed. In UML class diagrams, however, the type of an attribute may not be specified explicitly. In such a case, a dummy type Void is introduced as an uninterpreted type so that attributes whose types are not explicitly specified are declared as Void. In UML, there are notions of abstract, root, and leaf classes, parameterized elements, e.g. template classes, visibility of attributes and operations, etc. [2]. These notions can be specified with a slight modification to the generic class representation. For instance, the concept of template classes directly matches the construct of parameterized theory in the PVS specification language. 4.3 Associations In an OO modeling techniques, there are several alternatives to interpret associations and links in the context of classes and objects [3]: (1) as a set of data links in which case the objects involved in the association knows about one another; (2) as a separate association class; (3) as communication links. In our case, we represent association as a stand-alone PVS theory. This corresponds to the representation of relations in OUN (the Oslo University Notation) [23] and hence makes specification less complicated. OUN is one of the notations involved in the development of the multi-formalism platform, the Integrator [33], that is proposed to support formal development of open distributed systems. We define an association generically as a parameterized theory, which serves as a template for all associations and aggregations that occur in the class diagram. The list of formal parameters consists of the classes involved in the association and their respective roles (uninterpreted types), and the corresponding multiplicities (subsets of the set of natural numbers). This generic theory defines an instance of an association as a relation (a set of ordered pairs) on set of objects of the involved classifiers. The order of the entries of an ordered pair indicates the direction of navigation of the association. This can be relaxed to the general case of bidirectional association simply by using records instead of ordered pairs. Next, we give a scheme of a generic association theory and represent the association given in Figure 3 by instantiating the generic association. CourseOffering location 4 open() Student attends 3..10 Figure 3: Association 69 4.3 Associations Association(C1, C2, R1, R2: TYPE, M1, M2: TYPE = setof[nat]) : THEORY BEGIN obj1 : VAR C1 obj2 : VAR C2 Association : TYPE = setof[[obj1 : C1, obj2 : C2]] assoc : VAR Association m : nat f1 : [below[m] → C1] f2 : [below[m] → C2] % m = max(card(M1), card(M2)) % we import the cardinality theory from PVS library th1 : THEORY = cardinality@cardinality[C1, m, f1] th2 : THEORY = cardinality@cardinality[C2, m, f2] axiom12: AXIOM FORALL(obj1 : C1), (obj2 : C2) : (member(th2.card({obj2 | member((obj1, obj2), assoc)}), M2)) AND (member(th1.card({obj1 | member((obj1, obj2), assoc)}), M1)) END Association In theory Association, C1 and C2 specify classes whose objects are involved in the association, R1 and R2 denote roles of their respective object, whereas M 1, and M 2 are their respective multiplicities. The axiom axiom12 constrains the number of objects of one class that can be in the association with a single object of the other class. The fact that the instances of the involved elements play the roles R1 and R2 is not explicitly specified. However, this can be addressed, for instance, by defining a record type whose fields are a classifier, its multiplicity and its role. Then, the association is defined to be a relation on the instances of such a record type. Once the generic association theory is defined, the theory that represents a class diagram instantiates, for every association, the generic theory with actual parameters. For example, the class diagram theory may define the associations Attends and Teaches by including the following lines in the specification. A naming conflict may arise since variables or types with the same identifiers are declared during every instantiation. The PVS theory abbreviation mechanism discussed in Section 2.1, is used to address this problem. Attends : THEORY = Association (Student, CourseOffering, attendant, session, {n : nat | 3 ≤ n ∧ n ≤ 10}, {4}) 70 4.4 Generalization/Specialization Teaches : THEORY = Association(Faculty, CourseOffering, lecturer, session, {1}, {n : nat | 0 ≤ n ∧ n ≤ 4}) To distinguish between the two relations that specify the associations, we prefix them with the identifier of their corresponding theory. e.g. Attends.Association, Teaches.Association. 4.4 Generalization/Specialization Generalization/specialization is an inheritance relationship between a superclass and a subclass. In this kind of relationship, objects of the subclass inherit the structure and behavior of objects of the superclass’s, and in addition, can declare attributes and operations locally. Unlike the other relationships, we represent generalization as part of the subclass involved. The superclass is represented, like any other class, by a theory. The theory that represents the subclass imports, among others, the theory of the superclass and define a record type whose fields contain declarations of the local attributes and operations and concatenate this record type with the record types declared in the imported superclass theories. The generalization relationship between Student major Person name Figure 4: Generalization/Specialization of Classes objects of class Person and class Student depicted in Figure 4 extracted from Figure 1 is specified as follows: name, major : Attribute Person : Class = (# classID:="Person", attributes := {a | a = name}, operations := {o | false }, parents := {c | false}, interfaces := {i | false} #) Student : Class = (# classID:="Student", attributes := {a | a = major}, operations := {o | false}, parents := {c | c = Person}, interfaces := {i | false} #) 71 4.5 Aggregation One important requirement on the generalization that it is transitive, asymmetric relationship. That is, for any two classes A and B, if A is a subclass of B and B is a subclass of A, then they must be identical. Symbolically, (A ≺ B ∧ B ≺ A) ⇒ A = B where ≺ denotes a generalization relationship. In our case, this requirement can be captured by the axiom axgen specified below. The axiom states a sufficient condition to avoid cyclic inheritance. A, B, c0 : VAR Class allparents(c): axgen : 4.5 RECURSIVE setof[Class] = IF parents(c) = ∅ THEN ∅ S ELSE parents(c) ∪ c0 ∈parents(c) allparents(c0 ) ENDIF MEASURE (LAMBDA c: parent(c) 6= ∅) AXIOM NOT (B ∈ allparents(A) ∧ A ∈ allparents(B)) Aggregation Aggregation is a special kind of association that depicts a conceptual whole-part relationship. A simple aggregation is entirely conceptual and does nothing more than distinguish whole from part [2]. Another variant of aggregation, a composition, adds a semantics of strong ownership and coincidence of lifetime of a part with that of the whole. Parts with non-fixed multiplicity can be created after the composite itself, but once created they will die with it. We represent a simple aggregation by instantiating the generic association GenAssociation with appropriate parameters. For a composition, however, we define the composite class as a record type with one field for a set of objects of a part class, in addition to fields that specify its structure. For instance, the composite class Course and a part class CourseOffering (see Figure 1) can be specified as follows: 72 4.6 Semantics for UML Class Diagram Course : THEORY BEGIN Course : TYPE = [# oid : String, title : String, credithrs: nat, open : [Course → bool], addStud : [Course, StudInfo → Course], sessions : setof[CourseOffering] #] iscomp :THEOREM (∀ c1,c2: Course): sessions(c1) ∩ sessions(c2) = ∅ END Course Though the name of an aggregation is optional, in our formalization, we use the name Aggreg as a place holder so that it fits to the Association template. 4.6 Semantics for UML Class Diagram Finally, a class diagram is represented by a theory that puts all the constituents theories together. Constraints that involve instances of two or more entities, and global invariants on the behavior of the system are specified in the theory that represents the class diagram. Assuming that every entity of the class diagram given in Figure 1 is represented according to the above framework, the following is a sketch of a theory that specifies the class diagram as a whole. ClassDiagramName : THEORY BEGIN [declarations] IMPORTING Person, Student, PhdStud, Faculty IMPORTING Course, CourseOffering, Addition Attends: THEORY = Association (Student, CourseOffering, attendant, session, {n : nat | 3 ≤ n ∧ n ≤ 10}, {4}) Teaches: conj1: THEORY = Association(Faculty, CourseOffering, lecturer, session, {1}, {n : nat | 0 ≤ n ∧ n ≤ 4}) CONJECTURE (FORALL(co: CourseOffering) : EXISTS (f:Faculty): (member((f,co), teaches))) conj2: CONJECTURE (FORALL(ph: PhdStud), (c: CourseOffering): NOT (member((ph,c), attends)) AND (member((ph,c), teaches))) [invariants and global constraints] END ClassDiagramName 73 5. Conclusion and Future Work The class diagram theory imports or instantiates theories that corresponding to all the classes and interfaces in the class diagram, and instantiates the generic theory Association with actual parameters, for every association. Another important aspect of this theory is the specification of global constraints and conjectures. Conjectures are defined by the user, and recorded in the main theory for validation purpose, and they are not processed in the same way as the other PVS data which is processed automatically and considered as the semantics. They represent the kind of facts and properties that can be verified using our platform. For instance, conj1 states the requirement that a course can only be taught if there is a faculty who teaches a session. The conjecture conj2 ensures that a PhD student either attends or teaches a course but not both. 5 Conclusion and Future Work Several works on formalization of UML, mainly using Z [32] as semantic foundation, exist in the literature: [12, 13, 15, 31, 17]. Evans [12], Shroff et al. [31] developed an abstract description of UML class diagram using the Z notation as underlying formalism. In their approach, first the fundamental elements of a UML class diagram are formally represented as Z schemas. Then, the system view of the class diagram is formally characterized by a schema that composes the element schemas. The static aspect (attributes and identifier) of a class is represented as schema called Class Schema whereas attributes and identifiers of instances are represented as state variables. Class invariants are specified in the predicate part of a Z class schema. Jacobs et al. [17] translate JAVA classes into higher order, classical logic of PVS tool. A co-algebraic approach is used to give semantics to JAVA classes. PVS is used as a back-end to the LOOP (logic of object-oriented programming) tool that automatically provides a logical semantics for JAVA. Most of the formalization work done on UML notations have used Z as underlying formal notation. In our case, we use PVS-SL as underlying semantic foundation. The main reason behind this choice is the fact that PVS-SL seems to be one of the most suitable languages in the context of an integrated platform that we are building to support the formal development of open distributed systems. PVS supports functional specification style, uses conventional logic and can be mechanized easily, whereas procedural specifications such as Z involves some kind of Hoare logic for which it is more difficult to provide mechanized deduction. The platform integrates the UML and OUN (Oslo University Notation) [8, 23] specification formalisms. OUN is a trace-based formal notation targeted towards formal reasoning about open distributed systems. PVS provides a general semantics foundation and a set of powerful tools, among others, type checker, model checker, theorem 74 References prover, and their synergistic integration. An instance of high expressiveness of PVSSL is its ability to directly support reasoning about infinite traces, and this matches the need of OUN, which is a trace-based formal notation. As we mentioned in the introduction, the semantic artifacts are processed at the back-end of the tool we are currently developing, called the Integrator [33], for the automation of the platform. The formalization framework outlined in this paper is implemented in an integrated platform that supports formal development of open distributed systems and encouraging results are obtained. In the future, we extend the formalization work to other UML constructs. Behavioral modeling entities such as interaction diagrams, and statechart diagram are among the targets of our future work. We will also introduce various mechanisms such as refinement proof rules, and validation that are necessary for rigorous formal reasoning in the context of the Integrator platform by user-defined conjectures. References [1] G. Booch. Object-Oriented Analysis and Design with Applications. Benjamin Cummings, Redwood City, California, 1st edition, 1991. [2] G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison Wesley Longman Inc, Reading Massachusetts 01867, 1999. [3] Ruth Breu, Ursula Hinkel, Christoph Hofmann, Cornel Klein, Barbara Paech, Bernhard Rumpe, and Veronika Thurner. Towards a Formalization of the Unified Modeling Language. In Mehmet Aksit and Satoshi Matsuoka, editors, ECOOP’97 – Object-Oriented Programming, 11th European Conference, volume 1241 of LNCS, pages 344–366. Springer, 1997. [4] J.-M. Bruel and Robert B. France. Transforming UML Models to Formal Specifications. In the Proc. of the OOPSLA’98 Workshop on Formalizing UML. Why? How?, Vancouver, Canada, October 1998. [5] S. Cook and J. Daniels. Let’s Get Formal. Journal of Object-Oriented Programming (JOOP), pages 22–24, July 1994. [6] Rational Software Corporation. Rational Rose 98, 1998. Available at www.rational.com/products/rose/index.jtmpl. [7] J. Crow, S. Owre, J. Rushby, N. Shankar, and M. Srivas. A Tutorial Introduction to PVS. In WIFT’95: Workshop on Industrial-Strength Formal Specification Techniques, Boca Raton, Florida, USA, April 1995. [8] O.-J. Dahl and O. Owe. Formal Methods and the RM-ODP. Research report No. 261, March 1998. Department of Informatics, University of Oslo, Norway. [9] D. Duke. Object-Oriented Formal Specification. PhD thesis, University of Queensland, 1991. [10] E.H. Dürr and N. Plat. VDM++ Language Reference Manual. Afrodite (ESPRIT-III project) document AFRO/CG/ED/LRM/V10, cap Volmac, 1995. [11] B. Dutertre and S. Schneider. Embedding CSP in PVS: An Application to Authentication Protocols. In Theorem Proving in Higher Order Logics: 10th International Conference, TPHOLs ’97, volume 1275 of Lecture Notes in Computer Science, pages 121–136, Murray Hill, NJ, August 1997. Springer-Verlag. [12] A. Evans. Reasoning with UML Class Diagrams. In the Proc. of WIFT’98. IEEE Press, 1998. [13] A. Evans, J-M. Bruel, R. France, K. Lano, and B. Rumpe. Making UML Precise. In the Proc. of OOPSLA’98, Vancouver, Canada, October 1998. 75 References [14] R. B. France, J.-M. Bruel, and M. M. Larrondo-Petrie. An Integrated Object-Oriented and Formal Modeling Environment. Journal of Object-Oriented Programming (JOOP), 10(7), December 1997. [15] R. B. France, A. Evans, K. Lano, and B. Rumpe. The UML as a Formal Modeling Notation. Computer Standards & Interfaces, 19:325–334, 1998. [16] ISO. A Formal Description Technique Based on the Temporal Ordering of Observational Behavior, September 1988. ”ISO Standard 8807”. [17] B. Jacobs, J. van den Berg, M. Huisman, and M. van Berkum. Reasoning about Java Classes. In the Proc. of OOPSA’98, pages 329–340. ACM Press, 1998. [18] I. Jacobson, M. Christerson, P. Jansson, and G. Övergaard. Object-Oriented Software Engineering: A Use Case Driven Approach. Addisn-Wesley, Wokingham, England, 1992. [19] ISO-IEC JTC1/SC21/WG7. Reference Model of Open Distributed Processing (RM-ODP), 1995. [20] K. Lano and H. Haughton. The Z++ Manual. Technical Report, Imperial College, London, 1994. [21] A. Moreira and R. Clark. Combining Object-oriented Analysis and Formal Description Techniques. In the Proc. of ECCOP’94, LNCS, volume 821, Bologna, Italy, 1994. Springer-Verlag. [22] OMG. OMG Unified Modeling Language Specification, version 1.3, June 1999. OMG standard. [23] O. Owe and I. Ryl. The Oslo University Notation: A Formalism for Open, Object-Oriented, Distributed Systems. Report No. 270, August 1999. Department of Informatics, University of Oslo, Norway. [24] S. Owre, J. Rushby, N. Shankar, and F.V. Henke. Formal Verification for Fault-tolerant Architectures: Prolegomena to the design of PVS. IEEE Transactions On Software Engineering, 21(2):107–125, February 1995. [25] S. Owre, N. Shankar, J. Rushby, and D. W. Stringer-Calvert. PVS System Guide, version 2.3. Computer Science Laboratory, SRI International, Melon Park, CA, September 1999. [26] J. Rumbaugh and M. Blaha. Tutorial Notes: Object-Oriented Modeling and Design. In the Proc. of OOPSLA’91 Conference, Phoenix, Arizona, October 1991. [27] J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, and W. Lorensen. Object-Oriented Modeling and Design. Prentice Hall, Englewood Cliffs., N.J., 1991. [28] J. Rumbaugh, I. Jacobson, and G. Booch. The Umified Modeling Language, Reference Manual. Addison Wesley Longman Inc., 1999. [29] J. Rushby. Specification, proof checking, and model checking for protocols and distributed systems with PVS. In FORTE X/PSTV XVII ’97: Formal Description Techniques and Protocol Specification, Testing and Verification, November 1997. [30] S. Shlaer and S. Mellor. Object-oriented Systems Analysis: Modeling the World in Data. Yourdon Press Computing Series, Prentice Hall, Englewood Cliffs, NJ, 1991. [31] M. Shroff and R. B. France. Towards a formalization of UML Class Structures in Z. In the Proc. of the COMPSAC’97, 1997. [32] J. M. Spivey. The Z Notation: A Reference Manual. Prentice-Hall International, 2nd edition, 1992. [33] I. Traoré. The UML Specification of the Integrator. Research report No. 275, August 1999. Department of Informatics, University of Oslo, Norway. [34] J. M. Wing. A Specifier’s Introduction to Formal Methods. IEEE Computer, 23:8–24, September 1990. 76 Appendix C An Integrated Framework for Formal Development of Open Distributed Systems I. Traoré, D. B. Aredo and H. Ye Publication: I. Traoré, D. B. Aredo and H. Ye: An Integrated Framework for Formal Development of Open Distributed Systems, the Journal of Information and Software Technology (IST), Elsevier Science, a Special Issue on Software Engineering, Applications, Practices and Tools, from the ACM SAC 2003, vol. 46, no. 5, pp. 281-286, April 15, 2004. An earlier version appeared in the Proc. of ACM Symposium on Applied Computing (SAC 2003), March 9-12, 2003, Melbourne, Florida, USA. An Integrated Framework for Formal Development of Open Distributed Systems Issa Traoré1 , Demissie Aredo2 and Hong Ye1 1 Department of ECE, University of Victoria, Victoria B.C. V8W 3P6, Canada 2 Norwegian Computing Center, P. O. Box 114 Blindern, N-0314 Oslo, Norway Abstract This paper contributes to the discussion on issues related to the formal development of open distributed systems (ODS). The deficiencies of traditional formal notations in this setting are highlighted. We argue that there is no single formalism exhibiting all the features required to capture properties of ODSs. As a solution, we propose an integrated development framework that involves two notations: the Unified Modeling Language (UML) and the Prototype Verification System (PVS). We discuss the motivation for the choice of these notations, provide an overview of a CASE tool we have developed to support the proposed framework, and present a case study to demonstrate our approach. Keywords: Formal Methods, Open Distributed Systems, UML, PVS, Multi-formalism, Object-orientation 1 Introduction Motivated by the need for modeling the dynamic features of object-oriented programming languages and openness in distributed applications, the study of open, and dynamically extendable systems has become a very popular research area. In fact, since the late 80s, much research within theoretical computer science has been directed towards this kind of systems. The emphasis has mainly been put on semantic issues; in particular, on how such systems should be represented faithfully and fully abstracted. This has, for example, led to the development of the Pi-calculus [7], and to new refinements of the Actor model [1]. Most of the early proposals have a strong operational flavor. More recent denotational approaches are rather technical, and in most cases directed towards the Pi-calculus. 77 1. Introduction The above mentioned research attempts to find mathematical models suitable to describe the semantics of systems. The emphasis in our work is not on the semantics of systems, rather on the formal system development. Existing formal development methods suffer from certain limitations which constrain their application to large scale projects, especially their esoteric nature is a serious obstacle. This fact is well expressed by Kneuper [6] as follows: Software development is done by people, not by machines. No matter how ’good’ a development method is, it will only be successful if the developers who are to use it are willing and able to do so. Most specification techniques supporting the development of open distributed systems, such as the UML (Unified Modeling Language) [8], lack formal semantics and the various reasoning facilities provided by formal development methods. Moreover, we are not aware of any conventional formal development method that is able to fully handle the flexible, extendable and very dynamic features characterizing contemporary distributed systems. In RM-ODP [5], formal description techniques such as LOTOS, Z, SDL and Estelle are proposed for the specification of systems from various viewpoints. But, as pointed out in [2], these languages are only partly satisfactory. For instance, we may use Z for the description of the static parts of the information viewpoint, but it is not suitable to deal with the dynamic aspects. SDL and Estelle give little support for formal reasoning. LOTOS is a flexible description technique, but in our opinion, mainly suitable for the design phase. Taking the above remarks into account, the challenge is to build a platform that exhibits capabilities: - to be grasped and used in an industrial context; this requires characteristics such as communicability and user friendliness. - to support major aspects such as openness and dynamic reconfiguration exhibited by open distributed systems. - to produce formal specifications that are amenable to rigorous verification and validation. - existence of an efficient tool support, a prerequisite for its application to largescale systems. We are not aware of any single specification technique or method which provides all these capabilities. One obvious solution is to build-up a completely new method from scratch. However, this is extremely costly. Instead, we propose a multi-formalism approach where we adapt and integrate already existing technologies. More explicitly, based on the evaluation of several existing methods and CASE-tools, we propose a 78 2. Modeling Open Distributed Systems Using the UML platform based on the UML, for specification and refinement, and on the PVS-SL (Prototype Verification System-Specification Language) [9] for semantic foundation. The rest of the paper is organized as follows: In Section 2 we give an overview of the UML and discuss the rational behind our choice. In Section 3, we give an overview of our formalization framework. Then, in Section 4, we present a case study of a network reconfiguration protocol. Finally, in Section 5 we make some concluding remarks. 2 Modeling Open Distributed Systems Using the UML The choice of UML was dictated by the fact that it is built on an object-oriented framework and provides several capabilities such as extensibility mechanisms (e.g. stereotypes), dynamic and multiple classification, which are useful for the description of open distributed systems. In addition, UML provides an underlying methodology for specification and refinement, a graphical notation which contributes to communicability and friendliness, and very importantly, UML is an international standard for objectoriented modeling. 2.1 Support for Open Distribution Being an object-oriented approach, UML provides several capabilities such as encapsulation, data abstraction, extensibility, reusability and flexibility, which are essential features in modeling ODSs. Among the extensibility mechanisms, we can mention stereotypes for adding new building blocks, tagged values for creating new properties for existing constructs, and constraints for extending the semantics of a UML construct. UML provides mechanisms for handling the dynamic nature of an object type, which can be helpful in modeling dynamic reconfiguration in the context of open distribution. This is achieved through a set of interfaces that a class may implement. An instance of such a class will support all of those interfaces, but depending on the context, it may present only one or more of them as relevant. Each of these interfaces represents a role that an object can play over time. Dynamic typing can also be rendered through an interaction diagram, by displaying the role of each instance of the corresponding class in brackets below the object’s name or by connecting each variant with a become message. UML also provides several facilities for modeling distributed architecture, especially component and deployment diagrams. We use component diagrams in conjunction with object diagrams and interaction diagrams, as mentioned previously, to model mobility. 79 2.2 Limitations 2.2 Limitations In spite of the benefits it provides, UML has several limitations in the context of the formal development of open distributed systems. The graphical constructs provided by UML are not enough to achieve a complete and precise specification of the system. For instance, in [3] several incompleteness in the static semantic model of UML are reported, especially concerning the definitions of the concepts of aggregation, inheritance, constraints on inheritance hierarchies and abstract operation descriptions. In order to fill this gap, there is a need for extending the UML notation with respect to two main objectives: • The description of additional constraints on objects in the model, such as invariants on classes and types, abstract definitions of operations and attributes, non-functional requirements, etc. • The definition of a formal semantics for different constructs involved, in order to remove all ambiguities. The first objective is generally accomplished using natural language resulting in ambiguities. An alternative is the Object Constraint Language (OCL) [10], an assertion language easy to read and write, which is used to specify well-formedness of the modeling abstractions provided by UML. OCL has modeling constructs for types, classes, interfaces and associations, but its expressiveness is relatively limited in the context of dynamic aspects of systems, and as pointed out in [3], the semantic of OCL is not mathematically defined. Hence, in order to achieve the objectives mentioned earlier, we have decided to use PVS as semantic foundation for our platform. 3 3.1 Formalization of Object-oriented Models Overview Several works have attempted to provide a mathematical basis for the concepts underlying object-oriented models [3]. Some of these approaches consist of adapting or extending a novel or existing formal description technique with object-oriented concepts. Others derive a formal specification from the semi-formal (or informal) model built with existing object-oriented notations such as UML or OMT. The main problem with these approaches is the fact that the user should have to deal with a certain amount of formal artifacts, and as we have already argued, this can be a barrier to their application in industrial settings. A third approach, that has been adopted in this platform, consists of assigning a formal semantic to an existing object-oriented notation. In this case, the formal “stuff” 80 3.2 An Outline of Formal Semantics of UML Statechart is hidden behind the graphical notation, and the user deals with the graphical model, while the formal stuff is processed automatically at the back-end. PVS specifications are organized into a collection of theories which correspond to specification modules. A theory may consist of type, constant, axiom and theorem definitions. PVS provides a library of built-in theories called preludes that are reusable specifications. The PVS semantics that we define for a given UML diagram consists of generic PVS definitions common to all UML constructs and a collection of PVS definitions specific to the application. The generic definitions are organized into several PVS parameterized theories that are installed in the PVS library, whereas the specific definitions are organized into a theory which carries the actual semantic information underlying the diagram. The generic definitions are made available to this latter theory by importing them. UML consists of nine standard diagrams; our formalization work has focused so far only on three of them, namely class, sequence, and statechart diagrams. We give, in the following subsection, a brief sketch of our formal semantic definitions for the UML statechart. 3.2 An Outline of Formal Semantics of UML Statechart A UML statechart diagram is a state machine that describes all possible behavior of either a classifier (e.g. class, component etc.) or a use case. A specific behavior corresponds to a traversal of a graph of state nodes also called state vertex. The state nodes are related by transitions that are triggered by event instances, and may result in the execution of series of actions. The key components of the execution semantics of UML statecharts consist of an event queue that holds incoming events until they are dispatched, an event dispatcher mechanism that selects and dequeues event instances from the queue, and an event processor that processes dispatched events. The formalization scheme adopted in this work for UML statechart diagrams consists of defining the formal semantic of a statechart diagram as a transition system consisting of a triple (I, G, N ). N is a global transition relation that describes the execution sequence of the underlying state machine; G defines the global state in which the machine may be at a given time. I is an initialization predicate that describes initial global states. 3.2.1 Abstract Syntax and Well-Formedness Rules We describe the abstract syntax of the features involved in a statechart diagram by defining a generic theory called AbstractSyntax. We give in the following an overview of this theory. The basic features involved in a statechart diagram are the concepts of state vertex, state, event, action, guard condition and transition. A state vertex is 81 3.2 An Outline of Formal Semantics of UML Statechart an abstraction of a node in a statechart diagram. The various kinds of state vertices include state, shallow history vertex, deep history vertex, fork, join, junction etc. We describe these elements by providing suitable type definitions in PVS. AbstractSyntax : THEORY BEGIN lib: LIBRARY = "~ /prude/semantic/lib" Time, Vertex, Condition, Event, Action: State : set[Vertex] TYPE+ A transition is characterized by a source state, a target state, an activation event, a guard condition, and an associated action, which is executed when the transition is fired. Hence we define the syntax of a Transition: TYPE+ = [# source : Vertex, trigger : Event, guard : Condition, effect : Action, target : Vertex #] The set of states involved in a statechart diagram forms a tree structure consisting of a root state, composite states (e.g. can be further refined in substates) and simple states (e.g. cannot be refined). Function dsubvertex defines the set of subvertices directly contained by a given vertex. The other kind of vertices (e.g. non-state) have no subvertices; only states can have subvertices. As stated by the axioms, a composite state is either a concurrent state or a sequential state; the direct subvertices of a concurrent state are all sequential states. x, y : VAR Vertex dsubvertex: [Vertex − > set[Vertex]] compositeState?(x) : bool = member(x,State) AND dsubvertex(x) /= emptyset simpleState?(x): bool = member(x,State) AND dsubvertex(x) = emptyset isConcurrent: PRED[Vertex] isSequential(x): bool = compositeState?(x) AND NOT isConcurrent(x) ax concurrent1: AXIOM compositeState(x) <=> (isConcurrent(x) OR isSequential(x)) ax concurrent2: AXIOM isConcurrent(x) => (member(y,dsubvertex(x)) => isSequential(y)) ... END AbstractSyntax We describe the well-formedness rules defining a well-formed diagram by providing a generic theory called WellFormedness that takes a statechart instance as parameter. 82 3.2 An Outline of Formal Semantics of UML Statechart We define here the well-formedness rules as PVS axioms in theory WellFormedness. In the complete theory, we provide 7 axioms that cover all the rules defined by the standard UML informal semantic. We give in the following one of these rules, which states that: • A composite state can have at most one initial vertex, one deep history vertex and one shallow history vertex • There have to be at least two composite substates in a concurrent composite state. • A concurrent state can only have composite states as direct substates. • The substates of a composite state are part of only that composite state WellFormedness [(IMPORTING AbstractSyntax) sm: StateMachine]: THEORY BEGIN IMPORTING AbstractSyntax s, s1: VAR Vertex wf1: AXIOM (member(s1, states(sm)) AND member(s1,states(sm) AND compositeState?(s) AND compositeState?(s1)) => atmost1?(intersection(Initial(sm), dsubvertex(s))) AND atmost1?(intersection(DeepH(sm), dsubvertex(s))) AND atmost1?(intersection(ShallowH(sm), dsubvertex(s))) AND (s /= s1 <=> intersection(dsubvertex(s), dsubvertex(s1)) = ∅) AND (isConcurrent(s) => every(compositeState?, intersection(states(sm), dsubvertex(s))) ... END WellFormedness 3.2.2 Formal Semantics We define formally the semantic concepts underlying a statechart diagram by providing a generic theory named FormalSemantics. We describe in the following some of the features defined in that theory. FormalSemantics [(IMPORTING AbstractSyntax) sm: StateMachine, V: TYPE]: THEORY BEGIN IMPORTING WellFormedness1[sm] IMPORTING finite sequences[(events(sm))] 83 3.2 An Outline of Formal Semantics of UML Statechart The bottom-line of the formalization approach adopted in our work consists of defining a set of elementary predicates that describe relevant properties of the system state or the system operation. The set of elementary predicates is then partitioned into elementary states and events. A state describes a condition of the system that has a non-null duration. A clear distinction shall be made between the concrete state of the system and the notion of abstract state used in UML statechart. We represent the concrete state by a record type called V whose fields corresponds to the concrete state variables. We define three categories of predicates associated, respectively, with notions of state vertex, guard condition and action. The predicate associated with a state corresponds to a condition that must hold for the state to be active. The predicate associated with an action corresponds to a condition that holds after the execution of the action; that can be assimilated by the action’s postcondition. Whereas the state and the guard condition are functions of the current values of the state variables, the action’s postcondition is a function of both the current and the future values of the state variables. The state predicates need to be defined only for simple states. The predicates associated with composite states are defined as conjunction or disjunction of the predicates of their constituents according to whether they are concurrent or sequential states. VC: TYPE = [#current: V, next: V#] vc: VAR VC v: VAR V %Predicates for states, conditions, and actions pred: [Vertex − > PRED[V]] pred: [Condition − >PRED[V]] pred: [Action − > PRED[VC]] and ax: AXIOM isSequential(x) <=> pred(x) = disjunct({q | ∃ (y:(dsubvertex(x))): or ax: AXIOM isConcurrent(x) IMPLIES pred(x) = conjunct({q | ∃ (y:(dsubvertex(x))): q=pred(y)}) q=pred(y)}) In a statechart diagram, more than one state can be active at once. If a simple state is active, then all the composite states that contain it either directly or transitively are also active. The set of all the states that are active simultaneously defines what is called a state configuration. We define the initial configuration initConf of a statechart as a set containing all the default states involved in the diagram. All the states containing directly or transitively a simple state are active when that state is active. Intuitively, a configuration can be uniquely defined by providing the set of simple states involved. Therefore, we define a global predicate associated with a configuration as the conjunction of the predicates associated with the simple states involved in that configuration. Configuration: TYPE+ = finite set[Vertex] 84 3.2 An Outline of Formal Semantics of UML Statechart c : VAR Configuration ax configuration: AXIOM subset?(c, states(sm)) AND FORALL (x: Vertex): (member(x,c) => (isConcurrent(x) => subset?(dsubstate(x),c)) AND (isSequential(x)=>singleton?(intersection(dsubstate(x),c)))) % define an initial configuration initConf: Configuration ax init: AXIOM subset?(initConf,states(sm)) AND member(root(sm),initConf) AND (member(x,initConf) => (isSequential(x) => singleton(default(x)) AND (isConcurrent(x) => subset?(dsubstate(x),initConf)))) %predicate associated with a configuration pred(c):PRED[V]=conjunct({p: PRED[V] | EXISTS y: member(y,c) AND p = pred(y) AND simpleState?(y)}) %Initial state predicate init: PRED[V] = pred(initConf) We define, in the sequel, our transition system as a triple (init,V,next) where next is a global transition relation, V is the global (concrete) state, and init is an initialization predicate that is defined above. A transition is enabled if the event instance generated matches its trigger, its guard condition is true and its source state is active. An enabled transition may be illegible for firing. Firing a transition will activate its target state and execute its action. We define below the predicates enabled and fired that describe respectively the enabling and firing conditions of a transition. More than one transition may be enabled within a state machine, resulting in conflict. Example of conflicting transitions are transitions originating from the same state, triggered by the same event, but with different guard. If the event occurs and both guards are true, only one transition chosen according to an implicit priority mechanism will be fired. In case where there are concurrent states involved, several transitions may be fired at the occurrence of the same event. The set of transitions that will actually be fired in the whole state machine is a maximal set of enabled transitions with the highest priorities, and that are non mutually conflicting. e: VAR Event tr, tr1, tr2: VAR Transition a : VAR set[Transition] v1, v2: VAR V enabled(e, tr, v): bool = pred(source(tr))(v) AND (trigger(tr)=e) AND pred(guard(tr))(v) fired(tr,v,v1): bool = pred(target(tr))(v1) AND 85 4. Case Study pred(effect(tr))(vc) WHERE vc = (# current:=v, next:=v1#) maxEnabled(a,v, e): bool = subset?(a,transitions(sm)) AND FORALL (tr: (a)): enabled(e,tr,v) AND (FORALL (tr1: (a)): NOT conflict(tr,tr1)) AND (FORALL (tr2 | enabled(e,tr2,v) AND NOT member(tr2,a)): hasPriority(tr,tr2) OR samePriority(tr,tr2)) The semantic of UML statechart is based on the run-to-completion assumption, meaning that events are dispatched and processed one at a time. At the beginning of a run-to-completion step, a statechart is in a stable state configuration, with all the actions completed. At the end of the step, the same conditions apply as well. Before starting a run-to-completion step, a maximum set of enabled transitions is chosen non-deterministically and then fired. We define below a function called eprocess that describes event processing operations. Event processing consists of selecting and firing a maximal set of enabled transitions. In statechart informal semantic, there are no assumptions on the order of event dequeuing; we adopt in this work a simple priority scheme based on the first comes, first served principle. We also define the global transition relation called next based on function eprocess. c1, c2: VAR Configuration st: VAR set[Transition] eprocess(e,v,v1): bool = EXISTS st: subset?(st, transitions(sm)) AND maxEnabled(st,v,e) => (FORALL (tr:(st)): fired(tr,v,v1)) next(v1,v2): bool = EXISTS (e: (events(sm)), c1, c2): (pred(c1)(v1) AND pred(c2)(v2)) => eprocess(e,v1,v2)) 4 Case Study We illustrate our approach through the case study of a network reconfiguration protocol - the IEEE 1394 tree identify protocol [4]. 4.1 Summary of Requirements The IEEE 1394 tree identify protocol is used by the 1394 high performance serial bus for leader election tasks. The bus is used to transport digitized video and audio signals within networks of multimedia systems. It has an open and scalable architecture that allows addition and removal of devices and peripherals at any time. After a bus-reset (i.e. when a node is added to, or removed from the network), all the nodes in the network have equal status and know only to which node they are directly connected. The IEEE 1394 tree identify is based on a leader election algorithm that allows the election of a leader (root) that will act as a manager of the bus for subsequent phases 86 4.2 UML Specification parent 0..1 children * neighbors * Node Network parent: Node nodes:set[Node] set[Node] root: Node root:Manager 1 neighbors: children: set[Node] pending: set[Node] electLeader ( ) pending nodes:Regular * beMyParent (Node n):boolean acknowledge (Node n) confirm ( ) Regular Manager Figure 1: Class Diagram of 1394. The protocol works properly on connected and acyclic networks. It reports an error if a cycle is detected. At the end of a successful election, the collection of nodes will form a tree whose root is the manager. During the election, each node waits for a ”be my parent” request from its neighbors that are not his children. When the number of neighbors minus the number of children is exactly 1, the node can in its turn send a ”be my parent” request to the neighbor, which isn’t a child if it has not already received a similar request from that one. Each request is followed by an acknowledgement, and an acknowledgement of the acknowledgement. Two nodes may send a ”be my parent” request to each other simultaneously, resulting in contention. The standard resolves contention by specifying that each node will choose nondeterministically, in that case, to wait for a certain amount of time, and then re-sends a ”be my parent” request, if there was no such request from the other node. We assume that all nodes start executing at the same time. 4.2 UML Specification We describe the system by providing a UML class diagram (see Figure 1) and a UML statechart diagram (see Figure 2). 4.2.1 Class Diagram The class diagram consists of two classes: Node and Network. The class Node represents individual nodes involved in the network. A name, possibly a parent node, and 3 collections of nodes corresponding respectively to the neighbors, the actual children and the 87 4.2 UML Specification NetworkStatus Init electLeader ( ) Electing Node1Status beMyParent[c1]/ accept NodeK Status Waiting confirm( )/ update beMyParent[c1]/ accept Waiting confirm( )/ update beMyParent[c1]/ accept NodeNStatus Voting confirm( )/ update confirm( )/ update vote( )[c1] beMyParent[c1]/ accept ... vote( )[c1] Voting confirm( )[c3] confirm( )[c3] Timeout Timeout Contention Contention confirm( )[c2]/ update confirm( )[c2]/ update ParentElected ParentElected electLeader( )[c5] electLeader( )[c4] ErrorDetected LeaderElected Figure 2: Statechart Diagram potential children characterize an instance of Node. Potential children are represented by the role name pending. They actually correspond to nodes that have already sent a ”be my parent” request to a node, and are waiting for the acknowledgement. The class Network corresponds to the collection of nodes involved in the network. An instance of Node may be either a regular child or the manager in an instance of Network; the two associations relating both classes specify that. 4.2.2 Statechart Diagram The statechart diagram describes the dynamic behavior of the Network class in terms of the messages it sends and receives. Initially a Network object is in an initial state called Init that corresponds to the state immediately after a bus reset. Then the election starts with the occurrence of the electLeader event, bringing the Network object in the Electing state. If a leader is elected, represented by condition c4, the object will move to the LeaderElected state ending the statechart. If a cycle is detected, represented by condition c5, an error is reported, and the object evolves to the ErrorDetected state. The Electing state is a concurrent state whose direct substates, also called regions, describe the individual behaviour of the elements (e.g. the nodes) involved in the collection underlying a Network object. Dividing it using dashed line specifies the regions of a concurrent state. Each region corresponds to an independent substate, which is executed concurrently, when the parent state (e.g. the concurrent state) is active. Since the nodes in the collection have similar behaviour (with respect to the 88 4.3 Complementary Semantics and System Properties protocol), state Electing consists of N identical regions labelled respectively NodeiStatus, where i is a natural number such that 1 ≤ i ≤ N , and N is the number of nodes in the network. Given i such that 1 ≤ i ≤ N , state NodeiStatus starts in a Waiting state where the corresponding node waits for ”be my parent” request represented by event beMyParent from its neighbours. If a request is received from a neighbour that is not a child (condition c1), an acknowledgement is generated (action accept), followed by an acknowledgement of the acknowledgement (event confirm), and an update of the number of children of the node (action update). The update may lead to the Voting state, in case where the number of neighbours that aren’t children is exactly 1. In that state, the node can send a ”be my parent” request represented by event vote to the neighbour. The node may also receive at the same time a ”be my parent” request from the same node resulting in contention described by state Contention. After a timeout, the node returns in the Voting state. If the request is accepted (condition c2), the node evolves to the ParentElected state, which represents the final state of the NodeiStatus region. When all the nodes but one have their parents elected, the election process ends, and the single node, without any parent becomes the elected leader (condition c4). 4.3 Complementary Semantics and System Properties The standard UML notation provides only a partial specification of the system. The UML specification produced needs to be extended by providing complementary semantics for the elementary features (e.g. state, actions, conditions etc.) and properties involved using languages like the Object Constraint Language [10] or any other mathematical or textual languages. We give in the following some examples of complementary semantics and properties for the statechart in Figure 2 using OCL. The context of the expressions is a Network object, and two interacting Node objects k and n involved in the collection. Lets say that node k corresponds to one of the nodes whose behavior is described by StatuskNode. 4.3.1 Predicates Associated with Guard Conditions c1(n: Node,k: Node): Boolean = self.nodes→includes(n) and self.nodes→includes(k) and k.children→excludes(n) and k.neighbours→includes(n) c2(n: Node,k: Node): Boolean = self.nodes→includes(n) and self.nodes→includes(k) and k.pending→excludes(n) 89 4.4 Formal Analysis 4.3.2 Predicates Associated with States predInit(): Boolean self.nodes→ forAll(n | n.parent = null) and self.root = null predWaiting(k: Node): Boolean = self.nodes→includes(k) and ((k.neighbours→size) - (k.children→size) > 1) 4.3.3 Predicates Associated with Actions predUpdate(k:Node, n:Node): predAccept(k: Node, n: Boolean = k.children → includes(n) and (n.parent = k) and k.pending→excludes(n) Node): Boolean = k.pending → includes(n) The outcome of the action accept (expressed by predicate predAccept) is to update the list of pending nodes, that is the list of the nodes for which a beMyParent request has been received. The outcome of action update (expressed by predUpdate) consists of moving the requesting node from the pending list to the children list. 4.3.4 System Properties We give also some examples of properties that characterize a Network object. Prop1 ensures that there is at most one root in the network. Prop2 states that a root is the ancestor of the other nodes in the network. Though these properties may seem trivial, expressing and checking them quite often unveils misconceptions and inconsistencies. Prop1: self.nodes→ forAll(p1, p2| p1 = self.root and p2 = self.root implies p1 = p2) Prop2: self.nodes→ forAll(p| p <> self.root implies isAncestor(self.root,p)) 4.4 Formal Analysis In order to formally validate and verify the model, we need a formal description that is amenable to formal reasoning. As we already stated, we use PVS for that purpose. More specifically, we translate the OCL specification into PVS, and based on our semantic framework, we do the same for the UML graphical specification. The two PVS 90 4.4 Formal Analysis Figure 3: PVS Semantics Generated Using the PrUDE Tool specification fragments (from UML and OCL) are integrated into a single and homogeneous PVS specification that serves as a basis for the formal analysis activities like consistency checking, model checking, and proof checking. We have developed a supporting environment, to which we refer as the Precise UML Development Environment (PrUDE), which assists the specifier in generating the PVS model. The PrUDE tool also gives the specifier the possibility to invoke PVS tools, namely the type checker, model checker, and proof checker, either in batch mode, or interactively. Figure 3 presents a snapshot of the PVS semantic generated using the PrUDE tool. The lower window shows the log report generated after running the PVS tool in batch mode. The verification of the model is conducted by expressing the system properties in the form of PVS theorems, and then by checking them using mechanized support. For instance, property Prop1 (cf. Section 4.3), which states that there is at most one root in the network, is expressed in PVS as follows: p1,p2:VAR VNode prop1: THEOREM (member(p1,nodes(v)) AND member(p2,nodes(v)) ⇒ (root(v)=p1 AND root(v)=p2 ⇒p1=p2)) By invoking the PVS prover interactively from PrUDE, the proof of property Prop1 is as follows. prop1 : 91 4.4 Formal Analysis Figure 4: Automatic Verification of Prop1 Using the PrUDE Tool |------{1} FORALL (p1, p2: VNode, v: V): (member(p1, nodes(v)) AND member(p2, nodes(v)) => (root(v) = p1 AND root(v) = p2 => p1 = p2)) Rerunning step: (SKOSIMP*) Repeatedly Skolemizing and flattening, this simplifies to: prop1 : {-1} member(p1!1, nodes(v!1)) {-2} member(p2!1, nodes(v!1)) {-3} root(v!1) = p1!1 {-4} root(v!1) = p2!1 |------{1} p1!1 = p2!1 Rerunning step: (EXPAND "member") Expanding the definition of member, this simplifies to: prop1 : 92 4.4 Formal Analysis {-1} nodes(v!1)(p1!1) {-2} nodes(v!1)(p2!1) [−3] root(v!1) = p1!1 [−4] root(v!1) = p2!1 |------[1] p1!1 = p2!1 Rerunning step: (GROUND) Applying propositional simplification and decision procedures, Q.E.D. Run time = 0.17 secs. Real time = 0.22 secs. NIL PVS(33): Conducting interactive proof-checking, even from the PrUDE environment, is quite often tedious and time consuming. The properties expressed in our framework are based on a common template. Using that general structure, we have succeeded in defining general PVS proof strategies based on the notion of configuration pairs. Each strategy consists of primitive strategies, and can be used to check automatically our target properties. The proof strategy for statechart is as follows: (defstep property-proof-strategy (then (auto-rewrite ‘‘user defined axiom1’’ ’’user defined axiom2’’ ...) (skosimp) (expand ‘‘ConfigurationPair’’) (grind) ) ) The proof strategy denoted property-proof-strategy, collects the complementary semantics (e.g. user-defined axioms) as auto-rewrite rules, invokes skosimp command to replace universal quantifications in the target formulas with constants. The expand command is then used to expand the configuration pair definition. Finally the grind command, a catch-all strategy is invoked to apply all the necessary simplifications and complete the proof. These proof strategies are implemented in PrUDE and can be invoked to check automatically any proof obligation based on our framework. In case where the proof fails, a counterexample is produced, which can be used to trace errors in the original UML model. Figure 4 presents a snapshot of the automatic verification of property Prop1: the property is edited using a property editor (upper-window) and then checked automatically in less than a minute by invoking the prover. 93 5. Concluding Remarks 5 Concluding Remarks We have presented in this paper an automated platform that supports formal development of open distributed systems. One of the main objectives of our platform is to minimize the formal “stuff” the user of the platform should have to deal with. This in turn facilitates its industrial use. In this respect, we have decided to use in this platform PVS-SL as semantics foundation and not as a specification language. As a result, the user will not need to have an in-depth knowledge of the PVS formal notation and proof system. PVS-SL offers a very general semantic foundation and a set of powerful tools. It is highly expressive and offers several mechanisms for formal analysis. In order to enhance the automation of the formal verification process, we have defined suitable proof patterns and strategies for the kinds of properties that can be derived from our semantic model. These strategies are implemented in the current version of the PrUDE tool, and allow the automatic processing of our proof obligations. References [1] G. Agha, I.A. Mason, S. Smith, and C. Talcott. A Foundation for Actor Computation. Journal of Functional Programming, 7, 1997. [2] O. J. Dahl and O. Owe. Formal Methods and the RM-ODP. Research report No. 261, March 1998. Department of Informatics, University of Oslo, Norway. [3] A. Evans. UML class diagrams - filling the semantic gap. Technical Report, 1998. York University. [4] IEEE. IEEE Standard for a High Performance Serial Bus, August 1995. Standard 1394-1995. [5] ISO-IEC JTC1/SC21/WG7. The Reference Model of Open Distributed Processing, 1995. [6] R. Kneuper. Limits of Formal Methods. Formal Aspects of Computing, 9, 1997. [7] R. Milner, J. Parrow, and D. Walker. A Calculus of Mobile Processes part I and II. Information and Computation, 100, 1992. [8] The OMG. OMG Unified Modeling Language Specification, version 1.3, June 1999. OMG standard document. [9] S. Owre, N. Shankar, J. Rushby, and D. W. Stringer-Calvert. Reference, version 2.3, September 1999. PVS Language [10] J. B. Warmer and A. G. Kleppe. The Object Constraint Language: Precise Modeling with UML. Addison Wesley Longman Inc., Reading Massachusetts 01867, 1999. 94 Appendix D A Framework for Semantics of UML Sequence Diagrams in PVS Demissie B. Aredo Publication: Demissie B. Aredo: A Framework for Semantics of UML Sequence Diagrams in PVS, in the Journal of Universal Computer Science (J. UCS), Springer-Verlag Co. Pub., vol. 8, no. 7, pp. 674-697, July 2002. An earlier version appeared the Proc. of the UML2000 Workshop on Dynamic Behavior in UML Models, October 2, 2000, York, UK. A Framework for Semantics of UML Sequence Diagrams in PVS∗ Demissie B. Aredo Department of Informatics, University of Oslo P. O. Box 1080 Blindern, N-0316 Oslo, Norway E-mail: demissie@ifi.uio.no Abstract This paper presents a framework for representing formal semantics of a subset of the Unified Modeling Language (UML) notation in a higher-order logic, more specifically semantics of UML sequence diagrams is encoded into the Prototype Verification System (PVS). The primary objective of our work is to make UML models amenable to rigorous analysis by providing their precise semantics. This approach paves a way for formal development of systems through a systematic transformation of UML models. This work is a part of a long-term vision to explore how the PVS tool set can be used to underpin practical tools for analyzing UML models. It contributes to the ongoing effort to provide mathematical foundation to UML notations, with the aim of clarifying the semantics of the language as well as supporting the development of semantically-based tools. Keywords: Formal Semantics, UML, PVS, Formal Methods, Object-Orientation Category: D.3.1, D.1.5, D.2.4 1 Introduction The Unified Modeling Language (UML) [23, 18, 4] is an object-oriented modeling language that consists of a comprehensive set of notations. It is an industry standard modeling language (standardized by the Object Management Group (OMG)) for specifying, visualizing, and documenting artifacts of software intensive systems. Among the distinguishing properties of UML is its capacity to unify a collection of notations for object-oriented modeling - a property that may raise several fundamental issues in the context of software engineering. ∗ Published in the Journal of Universal Computer Science (JUCS), Springer-Verlag Co. Pub., 8(7), c pp. 674-697, July 2002, submitted: 16/1/2002, accepted: 22/7/2002, appeared: 28/7/2002 °J.UCS 95 1. Introduction Compared to other object-oriented modeling languages in software engineering, UML is more precisely defined and contains a great deal of formal specification notations, for instance, the use of the Object Constraint Language (OCL) [27] for constraint specification. However, it is not formal enough to address problems that relate to the lack of precision [10] and suffers from the major drawbacks of object-oriented methodologies - their limitation in the context of formal reasoning. The semantics of UML constructs is expressed in meta-models (descriptions of UML in UML) and natural language. Although the meta-models capture a precise notion of the abstract syntax of the UML modeling elements, they do little in addressing problems related to interpretation of non-trivial UML constructs [10]. The lack of formal semantic models for graphical UML constructs renders limitations in the context of rigorous model analysis and in developing semantics-based CASE tools [28, 10]. Consistency checks provided by currently available CASE tools are, for instance, limited to very simple syntactic checks, such as consistency of naming across models. Great improvements would have been achieved had tools been augmented with deeper semantic definitions for UML models [28]. Formal methods provide the rigor that is lacking in graphical UML notations. Providing formal semantic models to constructs of a modeling language enables us to identify and remove ambiguities, deficiencies, and inconsistencies from the language. Defining formal semantics for modeling constructs of a graphical language like UML is also a prerequisite for developing semantically based tool support. In the sequel, we propose semantics definition for UML sequence diagrams in the PVS specification language (PVS-SL) [21, 19]. We describe a general framework for formalization of UML diagrams, and an approach that involves graphical notations and formal methods to facilitate rigorous model analysis. The approach can readily be used to support system validation and verification. Our reference is the currently available standard documentation for the Object Management Group UML [18]; the informal semantics and the collection of well-formedness rules provided in the documentation. The PVS environment is chosen as an underlying semantic foundation for the following main reasons. Firstly, PVS provides general semantic notions necessary to model reactive systems. For instance, it supports the notions of sequences, lists, records, etc. that are crucial for providing trace-based semantic models for UML sequence diagrams. Secondly, the PVS environment has a powerful tool set consisting of a type-checker, a theorem-prover, and model-checker. Usually, a model given in a single sequence diagram results in only a partial specification, i.e. only subsets of the set of attributes and operations can be derived from a given sequence diagram. To provide a specification of a wide range of interactions in a system, several sequence diagrams should be used in combination. Composition of message sequence diagrams is dealt with in the literature, e.g. see works of Haugen [14], and Gunter et al [13]. Moreover, to obtain a detailed and more complete 96 2. The PVS Environment description of both structural and behavioural aspects of a system, it is necessary to combine several modeling techniques such as class diagrams, statecharts, and sequence diagrams. A class diagram provides structural description of classes and relationships among their objects; a statechart diagram describes dynamic behavior of a component; and a sequence diagram specifies interactions among the components. The UML notation is a combination of these modeling techniques and emphasizes their integrated use to capture properties of systems from different viewpoints. The works of Reggio et al [22], Blair et al [3], and Kammüller et al [17] address how different modeling techniques can be used. The rest of this paper is organized as follows. In Section 2, we briefly review the PVS environment, with emphasis put on the PVS specification language and theorem-prover, and discuss how they can be used together. In Section 3, we propose semantic models for basic concepts of UML sequence diagrams such as actions, events, messages, and objects. In Section 4, we describe the methodology used in our formalization framework, which includes a bottom-up construction of semantics of UML sequence diagrams. In Section 5, we demonstrate, by an example, the application of our formalization framework to model analysis. Finally, in Section 6, we conclude and discuss future research issues. 2 The PVS Environment The Prototype Verification System (PVS) [20, 6] is a formalism for design and analysis of system specifications. PVS consists of a highly expressive specification language, a powerful interactive theorem-prover, a type-checker, and other tools. A particular strength of PVS is its capacity to exploit the synergy between its tools, e.g. the typechecker and the theorem-prover complement each other. The PVS specification language is based on a classical typed higher-order logic. Its type system contains basic types such as boolean, nat, integer, real, etc. and type constructors such as set, tuple, record, and function. Record, set, and function type constructors are extensively used in the sequel to encode abstract syntactic and semantic domains of UML constructs in PVS. A record constructor is a finite list of fields of a general form R : TYPE = [# a1 : T1 , . . . , an : Tn #] where ai ’s are accessor functions and Ti ’s are type expression. For a record r of type R, i.e. r:R, function application-like terms ai (r) or r0 ai , rather than the conventional ’dot’ notation, is used to access the ith field of r. The structure of tuple type is similar to that of record type except that the order of fields is significant in tuples. A function constructor is of a general form F : TYPE = [D1 , D2 , . . . , Dn → R] where Di ’s and R are type expressions, F is the set of all functions with domain D = D1 × D2 × · · · × Dn and range R. The set of elements of type T is denoted by either pred[T] or setof[T], where each of them is a shorthand for S : [T → bool]. As a 97 2. The PVS Environment result, given a set s:S and an element t:T, membership of t in s is by the truth value of the expression s(t). The PVS type system has been augmented by predicate subtyping and dependent typing mechanisms and supports a richer type system than the standard classical higher-order logic and relies on an original approach to type checking [8]. Given a type T and a predicate p:[T → Bool], a predicate subtype T 0 = {t:T | p(t)} of T can alternatively be denoted by (p). Subtyping mechanism complicates type-checking, and yet allows a stronger checks for consistency and invariant in a uniform manner [6]. Accommodating partial functions in the logic of total functions, for instance, improves expressive power of the specification language. Subtyping mechanism, however, renders type checking undecidable; as a result of which the type-checker generates proof obligations called Type Correctness Conditions (TCC) that requires users to discharge them. Though a great deal of TCCs can be discharged automatically, the more involved ones require interactive use of the theorem-prover. Specifications in PVS are organized into hierarchies of theories. A theory may consist of specification of types, variables, constants, definitions, axioms, and conjectures. PVS supports modularity and reuse by means of parameterized theories that make it possible to specify generic modeling elements. The PVS-SL includes an extensive library of built-in theories, called preludes, which provide several useful definitions and lemmas. PVS also allows definition of Abstract Data Types (ADTs), from which a complete PVS theory is automatically synthesized during type checking. The following ADT, for example, specifies the standard stack data structure along with its constructors empty and push, two accessor functions top and pop, and two recognizers empty? and nonemptystack? that characterize empty and non-empty stacks respectively. stack[T : TYPE] : DATATYPE BEGIN empty : empty? push (top: T, pop: stack) : END stack nonemptystack? From such an ADT, a theory called stack adt[T:TYPE] that consists of axioms, theorems, definitions, etc. is automatically synthesized during type checking and completely specifies the stack data type axiomatically. For instance, the following is one of the axioms generated during type checking, and states an invariant property of stacks, i.e. for any stack a push operation followed by a pop operation leaves the stack unchanged. Symbolically, pop push ax : AXIOM (FORALL (x: T, s: stack): pop(push(x,s)) = s) Another invariant property of stacks is that application of two push operations followed by two pop operations to a given stack leave the stack unchanged. Symbolically, 98 3. Basic Concepts of UML Sequence Diagrams pop push th : THEOREM (∀ (x, y: T, s: stack): pop(pop(push(x, push(y, s)))) = s) This theorem can be discharged interactively by invoking the PVS theorem prover. While it is beyond the scope of this paper to explain details of the PVS environment, we have only highlighted some of its key features. For a more detailed presentation of the PVS environment, interested reader should refer to [6, 19, 20] 3 Basic Concepts of UML Sequence Diagrams The UML sequence diagram is a variant of the classical message sequence charts (MSC) [16]. Sequence diagrams are efficient constructs in modeling dynamic aspects of systems by building up storyboards of scenarios, involving the interacting objects and the messages that may be communicated among them. They show sequences of message passing as they unfold over time, and control flow throughout the interaction to effect a desired operation or result. A sequence diagram is especially useful to specify reactive systems with timedependent functions such as real-time applications, and to model complex scenarios where time dependency plays an important role. It is particularly useful technique to visualize dynamic behavior in the context of use case scenarios. To motivate the need o2: o1: o3: m1 m2 m3 m4 Figure 1: A UML Sequence Diagram for a formal semantics for UML sequence diagrams, let us consider the UML sequence diagram shown in Figure 1. It specifies an interaction among objects o1, o2, and o3. It constrains messages <m1, m2, m3, m4> to occur in that order. The diagram does not, however, state whether any of the messages must occur or may occur. The sequence <m1, m2, m4> is also a valid instance of the interaction modelled by the sequence diagram. In the classical message sequence charts [16], Damm et al [7] addressed this deficiency by introducing the concept of temperature - messages that must occur have hot temperature whereas messages that may occur have cold temperature. To model dependencies among messages one needs formal representation of sequence diagrams. Suppose that, in Figure 1, message m4 occurs only if messages m2 and m3 99 3.1 Actions and Operations occur in that order. This behavior cannot be specified by the graphical notations and induces a strong need for formal semantics. A sequence diagram specifies only a fragment of system behavior, usually an interaction between objects. To specify the complete behavior of an object or the system as a whole, several sequence diagrams should be used to specify all possible interactions during its life cycle [5]. The simplicity of sequence diagrams makes them suitable for expressing requirements as they can easily be understood by the customers, requirement engineers and software developers alike [28]. The lack of formal semantics for sequence diagrams, however, makes them ambiguous and difficult to interpret. The non-deterministic nature of sequence diagrams also aggravates the ambiguities in their interpretation. The sequence diagram shown in Figure 1, for example, turns to be non-deterministic if message m2 is removed - the sending of m1 and m3 can not be ordered uniquely. As a result, both <m1.out, m1.in, m3.out, m3.in, m4.out, m4.in> and <m3.out, m1.out, m1.in, m3.in, m4.out, m4.in> are allowable execution traces, where m.out and m.in denote, respectively, message sending and receiving events for message m. Before we define semantics of sequence diagrams, we need to provide semantic models for the basic concepts, such as actions, operations, events, messages, and objects. 3.1 Actions and Operations An action is an invocation of an executable statement that forms an abstraction of a computational procedure that results in a change in the state of the model [18]. It can be realized by sending a message to an object or by modifying a value of an attribute. We represent an action as a record type with the following fields: - the identifier of the action, normally the name of the associated message - a list of arguments that determine parameters needed to perform the action - a set of identifiers of the target objects. This enables us to capture the notion of multi-casting that is used in UML to implement message broadcast. - a boolean variable that will be used to check whether the action is synchronous or asynchronous. ActionID, ObjectID, ParameterID Action : TYPE = [# actionID : args : targets : isAsynch : : TYPE ActionID, finseq[ParameterID], setof[ObjectID], bool #] 100 3.1 Actions and Operations where finseq[] and setof[] are, respectively, types of finite sequences and set of elements of the type given as parameter predefined in PVS library. Note that the PVS specification language is case sensitive, except for built-in identifiers, and hence actionID : ActionID is a valid field declaration. In UML, there are several kinds of actions, namely the create, destroy, call, return, send, terminate, assignment, and uninterpreted actions. In the UML meta-model, these kinds of actions are specified as subclasses (or specializations) of the generic Action class. A CallAction, for instance, extends the general structure of Action by an attribute, which specifies the operation to be invoked, whereas the CreateAction specifies the class of which an object is to be created when the action ensue. To encode classes related by generalization relationship into PVS expressions, we use a general scheme that is described next. Consider the class diagram shown in Figure 2(a). B is a subclass of A. First, the superclass A is represented as a PVS record type whose fields consist of the class identifier, a set of attributes, and a set of operations. Then, B is encoded in a similar way with one additional field of type A that captures inherited parts of B, along with its local attributes and operations. The class identifier field of a specialization class is the inherited identifier of the general class. The PVS expressions shown in 2(b) is obtained from the UML class diagram shown in 2(a). The field asA (one for every superclass in general case), in the representation of the subclass B captures the structure and behavior inherited from the superclass A. Detailed discussion of issues related to formal representation of structural UML modeling elements is out of the scope of this paper. Interested readers may refer to relevant works in the literature [1, 11, 12]. Let’s begin by defining structural properties of operations, and call actions, i.e. remote operation invocation, and requirements on their well-formedness. OperationID, ClassID: TYPE Operation : TYPE = [# operationID : OperationID, isQuery : bool, parameters : finseq[ParameterID] #] CallAction: TYPE = [# asAction: CreateAction: param(ca : Action, operation : TYPE = [# asAction: Action, class: Operation #] ClassID #] CallAction) : bool = (args(asAction(ca)) = parameters(operation(ca))) The well-formedness rules for UML constructs are stated as predicates. For instance, the predicate param() specifies a well-formedness requirement on call actions, i.e. for any call action, the number and type of its arguments must match the parameters of the associated operation. Strictly speaking, call actions are instances of 101 3.2 Events and Messages CallAction that fulfill all requirements, including well-formedness rules. That is, the set of elements for which all the associated predicates holds - a predicate subtype of CallAction. A x:T D, R, T, Class : TYPE x:T;y:D A : Class = (# classID := "A", attributes := {x}, operations :={} #) f : [D → R] B y:D B : Class = (# asA := A, classID:="B", attributes :={y}, operations :={f} #) f : [D → R] (a) (b) Figure 2: Representation of Inheritance in PVS 3.2 Events and Messages An Event is a specification of a significant occurrence that has a location in time and space. In a description of communication among system components, we identify three kinds of events: a local operation call, a message send event, and a message receive event. We are interested in externally visible behavior of objects and hence ignore local operation calls. Occurrences of message send and message receive events usually involve invocation of operation of one object by another (not necessarily distinct) object, the source and the target objects respectively. Formally, we represent an event as a PVS record type whose fields consist of the event identifier, which is identical to the identifier of the associated message, the sender and the receiver objects of the associated message, an attribute that specifies the kind of event, the action that will ensue, and a list of arguments. Symbolically, Event type is specified as follows: EventID : TYPE; Time : TYPE = nat fin set[T : TYPE] : TYPE = finite set[T] EventKind : TYPE = {send, recv, local} Event : TYPE = [# eventID sender : : EventID, ObjectID, 102 3.2 Events and Messages receivers eventKind time action : : : : fin set[ObjectID], EventKind, Time, Action #] A message is a specification of a communication among objects, or an object and the environment of the system, and conveys information with the expectation that activity will ensue. It also specifies roles of the sender and receiver objects, as well as the associated action, which models the statement that causes the communication to take place. A message can be either a signal (asynchronous) or an operation call (synchronous). A message may be multi-casted to several target objects. UML, however, does not directly support message broadcasting. Rather, it simulates multicasting by making it possible to target a message to a set of objects. As a result, message receivers are represented as a finite set of objects. Making a distinction between message send events SendEvent and message receive events RecvEvent is necessary to specify behavior of objects participating in the interaction modelled by a sequence diagram. The SendEvent, RecvEvent, and LocalEvent types are specified as predicate subtypes of the Event type. e : VAR Event send?(e) : bool = eventKind(e) = send recv?(e) : bool = eventKind(e) = recv local?(e) : bool = eventKind(e) = local SendEvent : TYPE = (send?) RecvEvent : TYPE = (recv?) LocalEvent : TYPE = (local?) In our framework, a message send and the corresponding message receive events are considered to be two distinct instances of event occurrence. A message involves exactly two (not necessarily distinct) objects - the source, and the target. In case of iterative message passing and message broadcast, each communication is modelled separately. Hence, we model a message as a pair of send and receive events. The correspondence between them has to be established uniquely. The operation to be invoked and its parameters are extracted from the associated action. An important static constraint on a message is the causality requirement, which is formalized as a relation between set of SendEvent and the set of RecvEvent - a requirement that guarantees the fact that a message is sent before it is received. The UML supports the notion of time. For a message m, m.sendTime and m.receiveTime, (as described in OMG UML v1.3 [18] pp. 3-98) specify, respectively, the time the message is sent and received. That is the time of occurrences of the associated send and receive events. We capture the notion of time, by stamping every event by the time 103 3.3 Traces of Events of its occurrence and to store this information, we adorn the event record with the time field. The time information is useful to express temporal properties of traces of events, such as minimum time between occurrences of events. In the sequel, however, we consider only the order of occurrences of events. The global time stamps of events can be used for merging traces by interleaving them in the order of the time of occurrences of events. 3.3 Traces of Events A trace is a sequence of events that satisfies some predicates on events and program variables such as the causality predicate. The semantics of an object may be described by sets of infinite and finite traces reflecting non-terminating and terminating executions. However, for safety purposes finite trace semantics suffice to specify behavior of a system over a finite time interval, assuming that all iterations terminate, and we consider prefix-closed sets of traces of finite lengths. The PVS library includes a parameterized list ADT, which is synthesized, during type checking, into a complete theory that specifies the standard list data type. We represent traces of events as a prefix-closed set of finite list of events. To describe essential properties of traces, and ultimately behavior of sequence diagrams they model, we need to define some auxiliary functions on lists and events. t, t1, t2 : VAR list[Event] prefix(t1, t2) : bool = t1=prefix upto(length(t1),t2) where the function prefix upto() is a defined below. Note that types and variables that are specified in earlier sections are considered available in later sections and referenced without re-declaration. x, e, e1: VAR Event; s: VAR setof[Trace]; n : VAR nat prefix upto(n,t) : RECURSIVE list[T] = CASES t OF null : null, cons (x, t1) : IF n = 0 THEN null ELSE cons(x, prefix upto(n-1,t1)) ENDIF ENDCASES MEASURE length(t) In PVS, only total function calls are allowed, since the domain of function can be restricted by predicate subtyping, termination of all recursive functions must be proved. The MEASURE construct is a predefined structure in the PVS specification language and specifies how to prove the termination of recursively defined functions. 104 3.3 Traces of Events rank(e,t) : RECURSIVE nat = CASES t OF null : 0, cons(x, t1) : IF x=e THEN 1 ELSE 1 + rank(e,t1) ENDIF ENDCASES MEASURE length(t) prefix closed(s): bool = s(null) & (∀ e, t: s(cons(e,t)) ⇒ s(t)) es : VAR SendEvent er : VAR RecvEvent ts, tr : VAR list[Event] filter send(e,t) : filter recv(e,t) : list[Event] = filter(prefix upto(rank(e,t), send?) list[Event] = filter(prefix upto(rank(e,t), recv?) causal?(t): bool= ∀ er: member(er,t) ⇒ length(filter send(er,t))-length(filter recv(er,t)) >= 0 Trace : TYPE = (causal?) The prefix() and prefix upto() functions are used to determine correspondence between send and receive events that may comprise a message. The filter() function returns elements of the list, i.e. its first argument, that satisfy the predicate given as the second argument. Note that in the definition of the rank function, we are interested in the rank of events that occur in the trace given as an argument. Assigning rank zero to all the events that are not members of the trace does not affect the definition of the causality predicate causal?. The type Trace contains finite list of events that satisfy the causality predicate. Next, we define prefix-closure of a given trace t and precedence relation on the set events w.r.t. a given trace. n : below(length(t)) prefix closure(t): setof[Trace]= {prefix upto(n,t) | true} precede(e1,e2,t) : bool = rank(e1,t) ≤ rank(e2,t) The below() function is predefined in the PVS specification language and returns the set of natural numbers less than or equal to the actual parameter provided. 105 3.4 Notions of Class and Object 3.4 Notions of Class and Object A class describes a set of objects sharing a collection of features, including attributes, operations, and methods. It models the data structure and behavior of its objects. Each object of a class contains its own set of values corresponding to the structural features described in the class. In UML graphical notation, a class is rendered as a rectangular box with three compartments; the topmost compartment for the class name, the middle one for a set of attributes, and the last compartment for a set of operations. An example shown in Figure 3(a) describes a class with name Station, attributes phones, and operations requestCh, respond, activateCh, connect, gotoIdle, gotoBase. Types and initial values of attributes, and signatures of operations, except for the names, are all optional. Figure 3(b) shows a PVS specification of the class meta-model at a higher level of abstraction (details such as the set of interfaces realized by the class are abstracted away), and its instance, the Station class. An object is an Attribute, ClassID : TYPE Station phones requestCh() respond() activateCh() connect() gotoIdle() gotoBase() Class: TYPE = [# classID : ClassID, attributes : setof[Attribute], operations: setof[Operation], asClass : setof[ClassID] #] Station: Class = (# classID:= station, attributes:= {phones}, operations:= {request, ...}, asClass := {} #) (a) (b) Figure 3: Representation of a Class in PVS entity that exhibits observable properties. It specifies an instance of a class on which operations can be invoked and which has a state that stores the effects of the operations. An object may have a set of attribute values that implement its current state, and is connected to a set of links, where both sets conform to the specification of its class. In UML sequence diagrams, the existence of an object is depicted by an object box and a life-line. A life-line is a vertical line that specifies the existence of an object over a given period of time. Object creation and/or destruction during the interaction specified by the sequence diagram, and ordering of events that may occur on the object are specified. It does not, however, specify the exact time elapsed between occurrences of two events. The structure of an object is represented by a PVS record whose fields include: an 106 3.4 Notions of Class and Object object identifier, a class, a set of attributes, a set of operations, and a set of traces of events that models behavior of the object. Symbolically, AttributeLink : TYPE ObjectRec : TYPE = [# objectID class attributeLinks traces : : : : ObjectID, Class, fin set[AttributeLink], setof[Trace] #] We define the semantics of an object as a prefix-closed set of traces of events or operation calls that satisfy certain properties such as causality. Below, we define, as predicates, requirements that must be fulfilled by elements of type ObjectRec to be considered as valid object description. Then, a predicate subtype Object of ObjectRec that captures semantics of objects is specified. c : op : VAR Class; VAR Operation; classExists?(objr) : all attribs(objr): Object: at : VAR Attribute objr : VAR ObjectRec bool = NOT empty?(classes(objr)) bool = (∀ at: (slots(objr)(at) ⇒ (∃ c: classes(objr)(c) & attributes(c)(at)))) TYPE = {objr| classExists?(objr) & (∀t: member(t, traces(objr)) ⇒ causal?(t) & prefix closed(traces(objr)))} classExistLemma : LEMMA (∀ (obj : Object) : classExists?(obj)) The functions attributes and operations return, respectively, the sets of attributes and operations, local and inherited, of a class given as its argument, by recursively traversing its parent classes and interfaces it realizes. The predicates all ops, and all attribs specify that for every operations that may be invoked on an object and for every attribute of the object, there must exist a class in the set of classes of the object in which the operation and the attribute are specified. In this paradigm where multiple and dynamic classification is supported, i.e. an object can be an instance of several classes, and it may dynamically gain or lose a class during system execution. However, there must always exist at least one class, which specifies some structure and behavior of the object. This requirement is stated as the predicate classExists? and the lemma classExistLemma, where the latter can be discharged by invoking the PVS theorem prover. Other similar requirements such as the conformance of the set of link ends of an object to the set of association ends of one or more of its classes can similarly be stated and proven correct. 107 4. Semantics of UML Sequence Diagram 4 Semantics of UML Sequence Diagram Once the basic semantic elements are represented formally, we put them together into a PVS theory that contains representation of the semantic model of sequence diagrams. This approach is in line with the specification style of PVS - an entity should be defined before it can be referenced, and there is no forward reference. The semantic model of a sequence diagram should capture the behaviors that system specified by the sequence diagram should exhibit. For example, invariant properties of the system are stated as axioms and predicates respectively. Invariants that involve only parts that were separately defined are specified as predicates on the corresponding semantic models. We represent sequence diagrams, as a PVS record type with fields: - the identifier of a sequence diagram - the set of objects participating in the interaction specified by the sequence diagram - a prefix-closed set of traces of events modeling the interaction. We use a (possibly infinite) set of traces of events in order to capture non-determinism. In the PVS specification language, a trace can be modelled either as a (possibly infinite) sequence or finite list of events. The sequence and list data types are predefined in the PVS library. In the sequel, we model traces as lists. SeqDiagrams : THEORY BEGIN SeqDiagramID: TYPE SeqDiagRecord : TYPE = [# seqDiagramID : SeqDiagramID, objects : fin set[Object], traces : setof[Trace] #] sqr : VAR SeqDiagRecord; obj : VAR Object causal(sqr): bool= (∀ t: projection : [Trace, setof[Event] → Trace] = filter projects(sqr): traces(sqr)(t) ⇒ causal?(t)) bool = (∀ obj,t: (traces(sqr)(t) & objects(sqr)(obj))⇒ (∀ t1 : traces(obj)(t1) ⇒ member(projection(t, list2set(t1)), traces(obj)))) compose(sqr) : bool= (∀ e,t: (traces(sqr)(t) & member(e,t)) ⇒ (∃ obj: objects(sqr)(obj) ⇒ member(operation(action(e)), operations(obj)))) 108 5. Case Study: A Mobile Telephone System prefix closed(sqr) : bool = prefix closed(traces(sqr)) seqDiag : TYPE = {sqr | causal(sqr) & prefix closed(sqr) & projects(sqr) & compose(sqr)} END SeqDiagrams The list2set is a predefined PVS function on lists that converts a list into a set. A trace of events is a possible run of the system specified by the sequence diagram if and only if it satisfies the properties specified by the predicates. The projection function is defined as the built-in filter function and returns projection of a trace on a given set of events. The predicate projects states that for every allowable trace of a sequence diagram and an object participating in the interaction specified by the sequence diagram, the projection of the trace onto a trace of the object must be a valid trace of the object. The composition predicate compose states that for every event in a valid trace, there must exist an object, in the set of interacting objects on which the operation associated with the event is invoked. More behaviors, for instance model well-formedness rules, and relationships between elements of sequence diagram can easily be formalized similarly. 5 5.1 Case Study: A Mobile Telephone System System Description In this section, we present a case study to demonstrate the use of our approach in rigorous model analysis. Consider a dynamic network of mobile telephone system shown in Figure 4. The network consists of a central telephone exchange c : Center, two switching stations s1, s2 : Station, and a mobile telephone p : Phone attached to a vehicle moving around. This network configuration can be generalized to any finite number of stations and telephones. Each switching station covers a given range of (possibly overlapping) area and the telephone is initially connected to s1 as shown in Figure 4. Active communication channels are represented as solid lines, whereas inactive channels are represented as broken lines. Before the vehicle moves out of the range of station s1, the mobile telephone relinquishes its earlier contact with s1 and establishes contact with the station s2. This scenario is an instance of the notion of dynamic reconfiguration. Our objective is to model the reconnection interaction using UML sequence diagram, encode the model into PVS specification, and formally analyze its correctness and/or consistency with respect to the requirement specification. We assume that the switching stations s1, and s2 are permanently connected to the central station c, and that the mobile telephone p is connected to station s1 before the interaction begins. A crucial system requirement is that the mobile telephone 109 5.2 UML Specification of the System c: Center active channel inactive channel s1 : Station s2 : Station p :Phone Figure 4: A Mobile Telephone Network must remain connected to at least one station at any given time. This is equivalent to saying that, for a mobile telephone the set of base stations within its range must remain nonempty. This means that the mobile phone must, at any given time, remain connected at least to one station. 5.2 UML Specification of the System The class diagram depicted in Figure 5 shows specification of structural properties of the telephone network system described above. The UML sequence diagram shown Center channels stations selectCh() confirm() Phone station reconnect() * connected() 1 * baseStation 1..* Station phones requestCh() respond() activateCh() connect() gotoIdle() gotoBase() Figure 5: A Class Diagram Specification in Figure 6 models the reconnection interaction: when the mobile phone is leaving the range of s1 and entering the range of s2. When the signal from s1 gets weak, the mobile phone p sends a request for a channel to station s1 which in turn contacts center c to get appropriate stations and channels, respectively s2 and n in this case. We assume that c is capable, in a way we will not specify, to determine the appropriate 110 5.3 PVS Semantic Models station(s) and channel(s). When the station and the channel are confirmed, c responds to s1. Then, s1 informs p to reconnect to the identified station via the given channel, and s1 may go to Idle state when there is no other telephone connected to it. Finally, p establishes a connection to s2, and s2 goes to base state. reconnection p:Phone s1:Station c:Center s2:Station requestCh selectCh activateCh confirm respond reconnect gotoIdle [phones=∅] connect connected gotoBase Figure 6: Sequence Diagram: reconnection 5.3 PVS Semantic Models We provide a fragment of a PVS specification of the interaction described by the sequence diagram shown in Figure 6. The classes Center, Station and Phone are declared as classes with their respective set of attributes and operations (only partially listed in the case of the Station class). Operation : Attribute: Center : TYPE = {requestCh,activateCh,respond,connect, gotoIdle,gotoBase,reconnect,selectCh,confirm} TYPE = {stations: setof[Station], channels : setof[Channel], phones: setof[Phone]} Class = (#classID := attributes operations asClass := "Center", := {}, := {selectCh, confirm} {} #) 111 5.3 PVS Semantic Models Station : Phone : Class = (# classID := "Station", attributes := {phones}, operations := {activateCh,respond,connect, gotoIdle,gotoBase,requestCh}, asClass := {} #) Class = (# classID := "Phone", attributes : setof[Attribute], operations : {reconnect, connected}, traces : prefix closure((: requestCh,reconnect, connect,connected:)), parents : { } #) The objects c, s1, s2 and p are declared as an instance of the Object type with appropriate values assigned to its fields. We present explicit specification of the objects p,s1,s2 and c. Finally, we sketch an explicit model of the sequence diagram reconnection. c, p, s1, s2 : VAR Object p: Object = (# objectID := "p", class := {Phone}, attributes := {stations} #) s1 : Object = (# objectID := "s1", classes := {Station}, traces:= prefix closure((: s2 : c : sq : requestCh,selectCh, respond,reconnect, gotoIdle:) #) Object = (# objectID := "s2", classes := {Station}, traces : prefix closure((:activateCh,confirm, connect,connected, gotoBase:)) #) Object = (# objectID := "c", classes := {Center}, attributes := {channels, stations}#) SeqDiag = (# seqDiagramID := "reconnection", objects := {c, s1, s2, p}, traces := {prefix closure((:p.requestCh, s1.requestCh, s1.selectCh, c.selectCh,..., s1.gotoIdle, s2.gotoBase:)), . . . 112 5.3 PVS Semantic Models prefix closure((:p.requestCh, ... s2.gotoBase, s1.gotoIdle:))}#) In description of traces, an event is denoted by the identifier of the object on which the event occurs followed by a dot and the name of the operation to be invoked for RecvEvent and vise versa for SendEvent. For instance, requestCh.p is a send event where as s1.requestCh is the corresponding receive event. As mentioned earlier, the specification given in Figure 6, assuming that there is no mobile phone connected to s1 other than p, states that s1 enters Idle state after it sends the reconnect message to p. Station s2 becomes a base station for p when it receives the connect message. The UML sequence diagram shown in Figure 6 does not guarantee that the mobile telephone is connected to the new base station s2 before station s1 enters Idle state. In the classical message sequence charts (MSC) [16], an approach known as a general ordering is used to guarantee deterministic order of event occurrences. UML sequence diagram does not support such an approach and hence a need for formal semantics that ensure this sort of behavior of systems. Once a UML sequence diagram modeling a system interaction is encoded into PVS specification language as a prefix-closed set of traces of events, temporal properties of the system can be stated as predicate on the traces. For instance, the idlePred predicate given below constrains the station object s1 from becoming Idle before the mobile phone is reconnected to a new base station s2. idlePred(t:Trace): bool = (∀ t, sq: traces(sq)(t): precede(connected,gotoIdle)) pv : VAR Phone; sv : VAR StationID; cv : VAR Center; chv : VAR Channel isConnectedTo(pv,sv): mayConnectTo(pv,sv): connectivityPred(pv): theorem1 : bool= attributes(sv)(pv)&attributes(pv)(sv) bool= (∃ cv: attributes(cv)(sv) & NOT attributes(pv)(sv)) bool = attributes(pv)(stations) 6= ∅ THEOREM (∀ sv, pv: NOT (isConnectedTo(pv,sv) & mayConnectTo(pv,sv))) System requirements are stated as theorems, and we verify that a specification meets the requirements, we need to discharged the theorems using the PVS proof system. For instance, the theorem theorem1 captures the fact that a mobile telephone is either connected or not connected to a station, but not both. The theorem can 113 5.3 PVS Semantic Models be discharged automatically by a single prover command ”grind”. The following is a snapshot of a proof of the theorem. theorem1: {1} ∀ (pv,sv: Class): ¬ (isConnected(pv,sv) & mayConnectTo(pv,sv)) Skolemizing, theorem1: {-1} (isConnected(pv0 , sv0 ) & mayConnectTo(pv0 , sv0 )) Trying repeated skolemization, instantiation, and if-lifting, This completes the proof of theorem1. Q.E.D. Although the theorem follows straightforwardly from the definitions given above, it clearly demonstrates how the integrated framework enables us to exploit the strengths of the UML notations and the PVS proof system in requirement engineering. The UML models enable us to describe systems at appropriate level of abstraction to improve our understanding of the system in question. They can be used as contract between the stakeholder. The corresponding semantic models that are obtained by translating the UML models into PVS specification language, augmented with additional PVS expressions if need be, enable us to verify important system requirements. Two points are worth discussing in connection with the translation of UML sequence diagrams into PVS, and the integration of UML CASE tools and the PVS toolkit. Firstly, we discuss how the semantic models resulting from translation of graphical UML models and the PVS proof system interact. The semantic models may not be sufficient to capture system requirements that would be verified, and hence it may be necessary to augment them with pure PVS expressions. Verification of the overall system requirements by using the PVS proof system is straightforward as the whole system specification is in PVS. A drawback of this approach is that users that may not be experts in formal methods should directly deal with formal specifications on PVS side. This contradicts our aim of hiding formal artifacts at the back-end so that users interact with the graphical front-end. An alternative approach is to specify the additional requirements in an ad hoc language such as the object constraint language (OCL) [27] and translate the OCL expressions into PVS language, and reason about the constraints using the PVS theorem prover. Secondly, the integration of a UML CASE tool and the PVS toolkit into a single platform requires a mapping of semantic models into the corresponding UML models. For instance, if the PVS theorem prover detects an error in the PVS semantic 114 6. Conclusion and Future Work model during a verification process, how can this be communicated to users that are not experts in PVS? This can be done by developing a browser that reverse engineer the translation of UML into PVS. Keeping records of correspondence between UML modeling elements and their counterparts in PVS specifications simplifies the parsing. For instance, by using the same identifers in UML models and the corresponding PVS semantic models will significantly simplify propagation of errors detected during verification onto the UML models. This is, however, out of the scope of this papers and one of the potential issues for future work. 6 Conclusion and Future Work In this paper we outline a framework for formalization of UML constructs. Expressing semantic models of UML constructs in a formal specification language enables us to rigorously analyze the models. The resulting semantic models are amenable to rigorous analysis, and facilitate the design and implementation stages as well as use of formal techniques in software verification and validation tasks. Moreover, the underlying formal language and its tool set is used to underpin CASE tools that are developed to automate model analysis. In our case, once the UML modeling constructs are translated into semantic model in PVS-SL, general properties of UML models, such as well-formedness rules, can be stated and proved correct by using PVS tools like theorem-prover and type-checker. The PVS theorem prover discharges most of the proof obligations with little interaction from the user if the requirements are well formulated - and not involving complex quantifier reasoning. This work contributes to the ongoing effort to provide formal semantics of UML, with the aim of clarifying and disambiguating the language as well as supporting the development of semantically based tools. It is a part of our long-term vision to explore how the PVS tool set could be used to underpin practical tools to analyze UML models. There are several related research works on the formalization of UML constructs in the literature [24, 9, 10, 12, 28] mostly using Z [25] as the underlying semantic foundation. The work on encoding of CSP [15] in PVS [8], is similar to ours. A distinguishing feature of our work is the integration of informal graphical modeling notations and highly expressive formal notations, and utilization of existing tools to analyze UML models. For relevant and detailed information, the reader may refer to our earlier works on formalization of other UML modeling techniques: structural modeling techniques [1], and state machines [26, 2]. A UML sequence diagram describes a fragment of dynamic system behavior resulting in a partial specification. To achieve a more complete system description, one needs to combine several models such as class and statechart diagrams, i.e. different viewpoints in UML vocabulary. When different modeling languages are combined, their relationship should clearly be defined, and consistency between different viewpoints 115 6. Conclusion and Future Work must be maintained. In the future, we will investigate how different UML modeling constructs can be used in combination and how they complement each other without violating consistency. Model checking will also be among the research topics we will investigate in the future. Reverse engineering of PVS semantic models to UML models is among topics for future investigation. Acknowledgements I would like to thank Olaf Owe, Wenhui Zhang, and Issa Traoré for fruitful discussions and comments. This work was financed by the Research Council of Norway (NFR) through the research program for Distributed IT-Systems. Comments by the anonymous reviewers were useful for the improved presentation of this paper. References [1] D. Aredo, I. Traoré, and K. Stølen. An Outline of PVS Semantics for UML Class Diagrams (extended abstract). In the Proc. of The 11th Nordic Workshop on Programming Theory NWPT’99, Uppsala, Sweden, October 6-8, 1999. [2] D. B. Aredo. Semantics of UML Statecharts in PVS. In the Proc. of 7th World Multiconference on Systemics, Cybernetics and Informatics (SCI2003), Orlando, Florida, USA, July 27-30, 2003. [3] L. Blair and G. S. Blair. Composition in Multi-Paradigm Specification Techniques. In the Proc. of 3rd International Workshop on Formal Methods for Open Object-based Distributed Systems (FMOODS’99), Florence, Italy, February 15-18, 1999. Kluwer. [4] G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison Wesley Longman Inc, Reading Massachusetts 01867, 1999. [5] R. Breu, R. Grosu, C. Hofmann, F. Huber, I. Kruger, B. Rumpe, M. Schmidt, and W. Schwerin. Exemplary and Complete Object Interaction Descriptions. In Haim Kilov, Bernhard Rumpe, and Ian Simmonds, editors, the Proc. of OOPSLA’97 Workshop on Object-oriented Behavioral Semantics, Atlanta, Georgia, October 1997. TUM-I9737. [6] J. Crow, S. Owre, J. Rushby, N. Shankar, and M. Srivas. A Tutorial Introduction to PVS. In WIFT’95: Workshop on Industrial-Strength Formal Specification Techniques, Boca Raton, Florida, USA, April 1995. [7] W. Damm and D. Harel. LSC’s: Breathing Life into Message Sequence Charts. In Formal Methods for Open Distributed Systems (FMOODS’99), Florence, Italy, February 15-18, 1999. [8] B. Dutertre and S. Schneider. Embedding CSP in PVS: An Application to Authentication Protocols. In Theorem Proving in Higher Order Logics: 10th International Conference, TPHOLs ’97, volume 1275 of Lecture Notes in Computer Science, pages 121–136, Murray Hill, NJ, August 1997. Springer-Verlag. [9] A. Evans. Reasoning with UML Class Diagrams. In the Proc. of WIFT’98. IEEE Press, 1998. [10] A. Evans, R. B. France, K. Lano, and B. Rumpe. Developing the UML as a Formal Modelling Notation. In Jean Bézivin and Pierre-Alain Muller, editors, The Unified Modeling Language, UML’98 - Beyond the Notation. First International Workshop, Mulhouse, France, pages 297– 307, June 1998. [11] R. B. France, J.-M. Bruel, and M. M. Larrondo-Petrie. An Integrated Object-Oriented and Formal Modeling Environment. Journal of Object-Oriented Programming (JOOP), 10(7), December 1997. 116 6. Conclusion and Future Work [12] R. B. France, A. Evans, K. Lano, and B. Rumpe. The UML as a Formal Modeling Notation. Computer Standards & Interfaces, 19:325–334, 1998. [13] E. L. Gunter, A. Muscholl, and D. A. Peled. Compositional Message Sequence Charts. In the Proc. of TACAS 2001, pages 496–511. Springer-Verlag Heidelberg, 2001. LNCS 2031. [14] Ø. Haugen. Practitioners Verification of SDL Systems. PhD thesis, University of Oslo, April 1997. [15] C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall, 1985. [16] ITU-TS. ITU-TS Recommendation Z.120: Message Sequence Chart (MSC), 1996. [17] F. Kammüller and S. Helke. Mechanical Analysis of UML State Machines and Class Diagrams. In the Proc. of Workshop on Precise Semantics for the UML. ECOOP2000, Cannes, June 2000. [18] OMG. OMG Unified Modeling Language Specification, version 1.3, June 1999. OMG standard. [19] S. Owre, J. Rushby, N. Shankar, and F.V. Henke. Formal Verification for Fault-tolerant Architectures: Prolegomena to the design of PVS. IEEE Transactions On Software Engineering, 21(2):107–125, February 1995. [20] S. Owre, N. Shankar, J. Rushby, and D. W. Stringer-Calvert. PVS System Guide, version 2.3. Computer Science Laboratory, SRI International, Melon Park, CA, September 1999. [21] S. Owre, N. shankar, and J. M. Rushby. The PVS Specification Language, April 1993. Computer Science Lab., SRI International. [22] G. Reggio, E. Astesiano, C. Choppy, and H. Hussmann. Analysing UML Active Classes and Associated State Machines – A Lightweight Formal Approach. In Tom Maibaum, editor, the Proc. Fundamental Approaches to Software Engineering (FASE 2000), Berlin, Germany, volume 1783 of LNCS. Springer, 2000. [23] J. Rumbaugh, I. Jacobson, and G. Booch. The Umified Modeling Language, Reference Manual. Addison Wesley Longman Inc., 1999. [24] M. Shroff and R. B. France. Towards a formalization of UML Class Structures in Z. In the Proc. of the COMPSAC’97, 1997. [25] J. M. Spivey. The Z Notation: A Reference Manual. Prentice-Hall International, 2nd edition, 1992. [26] I. Traoré. An Outline of PVS Semantics for UML Statecharts. Jounal of Universal Computer Science, 6(11):1088–1108, 2000. [27] J. B. Warmer and A. G. Kleppe. The Object Constraint Language: Precise Modeling with UML. Addison Wesley Longman Inc., 1999. [28] J. Whittle. Formal Approach to Systems Analysis Using UML: An Overview. Journal of Database Management, 11(4):4–13, 2000. 117 118 Appendix E Semantics of UML Statecharts in PVS Demissie B. Aredo Publication: Demissie B. Aredo: Semantics of UML Statecharts in PVS, in the Proc. of the 7th International Multi-conference on Systemics, Cybernetics and Informatics (SCI2003), July 27-30, 2003, Orlando, FL, USA. Semantics of UML Statecharts in PVS∗ Demissie B. Aredo Norwegian Computing Center P. O. Box 114 Blindern, N-0314 OSLO, Norway. E-mail: aredo@nr.no Abstract In this paper, we present a formal semantics for the UML statecharts in the PVS specification language. Based on the semantics, we develop a general framework for translating UML statechart diagrams into PVS specifications, and show how the resulting specification can be model-checked by using the PVS toolkits. This work is part of a long-term vision to explore how the PVS formalism can be used to underpin practical tools for checking correctness of UML models, and it contributes to the ongoing effort on providing precise semantic definitions for UML notations with the aim of clarifying the language as well as supporting development of semantically based CASE tools. Keywords: Formal Semantics, UML, PVS, Method Integration, Statecharts 1 Introduction The Unified Modeling Language (UML) [13] is an industrial standard for objectoriented modeling languages that was standardized by the Object Management Group (OMG). It is a collection of several description techniques, which are suitable for modeling different aspects of software systems. Compared to other object-oriented modeling languages in software engineering, UML is more precisely defined and contains a great deal of formal specification notations, e.g. the use of Object Constraint Language (OCL) [18] for specifying constraint. However, semantic definitions for UML notations are not precise enough to support rigorous reasoning - a limitation that hampers its application to rigorous system development. In the sequel, we propose formal semantics for the UML statecharts. Our aim is to achieve two goals. Firstly, we provide semantic model for basic modeling elements of UML statecharts using the PVS specification language [14]. This consists of formal ∗ Published in the Proc. of the 7th International Multi-conference on Systemics, Cybernetics and Informatics (SCI2003), July 27-30, 2003, Orlando, FL, USA. 119 2. The PVS Environment representation of the abstract syntax and the well-formedness rules, and model-checking the resulting specification. Secondly, we propose a general scheme for translating UML statecharts into PVS specifications. This results in semantic models that are amenable to rigorous analysis. Using PVS tools such as the theorem-prover and model-checker, we rigorously reason about the resulting semantics models. Several works have been undertaken to provide mathematical basis to the concepts underlying object-oriented (OO) models using different approaches and semantic foundations. In general, formalization approaches can be categorized into three: [5]: supplemental, OO-extension and method-integration. In the supplemental approach informal modeling notations are replaced by more formal constructs. The work of Moreira et. al. [12] is based on this approach and involves the LOTOS and the syntropy notations. The OO-extension approach extends existing formal methods by OO features thus making them more compatible with the concepts of object-orientation. For example, VDM++, Z++, and Object-Z are based on this approach. Even though a rich body of formal notation results from supplemental and extension approaches, the resulting semantic domain is more complex and suffers from lack of tool support [1, 3]. Moreover, users have to deal directly with a certain amount of formal artifacts. This is one of major barriers for whole-scale utilization of formal methods due to their esoteric nature. The method-integration [16] approaches makes OO notations more precise and amenable to rigorous analysis by integrating them with suitable formalism(s) [4]. It is a more workable and commonly used approach to formalization of OO modeling notations. The OO notation and a carefully chosen formalism, and their respective CASE tools are integrated allowing developers to manipulate the graphical models they have created without having an in-depth knowledge about the formal specifications that are processed at the back-end [3]. Our work is based on the method-integration approach and provides semantic definitions for UML statecharts using the specification language of PVS as underlying semantic foundation. The rest of the paper is organized as follows: In Section 2, a brief overview of the PVS specification language is presented with emphasis put on concepts and notations that will be encountered in later sections. In Section 3, main concepts of UML statecharts are discussed. In Section 4, semantic definitions for the basic concepts of UML statecharts are proposed. Finally, in Section 5, we draw some conclusions and discuss future works. 2 The PVS Environment PVS [15, 2] is a formalism for design and analysis of system specifications. It consists of a highly expressive specification language tightly integrated with a type-checker, a theorem-prover, and other tools. The strength of PVS is its capacity to exploit the 120 2. The PVS Environment synergy between its specification language and tools, e.g. the type-checker uses the theorem-prover. The theorem-prover allows construction of proofs interactively and rerun them automatically after minor changes. The PVS specification language (PVS-SL) provides a very general semantic foundation based on the classical higher-order logic. Its type system consists of basic types such as boolean, integer, real, and constructors for set, tuple, record, and function types. A record type consists of a finite set of fields R:TYPE= [# a1 : T1 , . . . , an : Tn #] where ai ’s are accessor functions and Ti ’s are type expression. Given a record r:R, a function application-like term ai (r), is used to access the ith field of r. Tuples have similar structures except that the order of fields is significant in tuples. A function type is specified as F:TYPE = [D → R] where D and R are type expressions denoting domain and range of the functions. For a given type T, the type of sets of elements of T is specified using one of the constructs pred[T] or setof[T], each of which is a shorthand for the predicate [T → bool]. For a given set s:setof[T] and t:T, membership of t in s is determined by the truth value of member(t,s), or s(t). The type system of the PVS-SL has been augmented by predicate subtyping and dependent typing mechanisms and supports a richer type system than the classical higher-order logic. Subtyping makes type-checking more powerful and allows stronger checks for consistency and invariance in a uniform manner [2]. However, it renders type checking undecidable as a result of which the type-checker generates proof obligations called Type Correctness Conditions (TCCs). A great deal of TCCs are discharged automatically, whereas more involving ones require interactive use of the theoremprover. Predicate subtypes can be specified in two different ways. Given a type T and a predicate p on elements of T, a predicate subtype of T with respect to p, can be specified as either S:TYPE = {t:T | p(t)} or S:TYPE = (p). When the expression of the predicate is not explicitly given, we can specify S as uninterpreted subtype of T, symbolically S: TYPE FROM T. The PVS prover provides primitives to perform inductive reasoning, rewriting, and model checking. These features simplify the proof process as mechanical aspects can easily be automated quite easily [8]. Specifications in PVS are organized into hierarchies of theories. A theory may contain type, variable, and constant declarations, definitions, axioms, and theorems. Modularity and reusability are captured by parameterized theories that specify generic elements that are instantiated by theory abbreviation construct. Predicates, usually known as assumptions, are used to constrain the parameters of a generic theory. PVS-SL includes a library of an extensive set of built-in constructs known as preludes, which provides several useful definitions and lemmas. A detailed presentation of the PVS environment is beyond the scope of this paper. For a more complete and detailed discussion, interested reader may refer to [14]. 121 3. UML Statecharts 3 UML Statecharts UML statecharts [13] are primary modeling elements for construction of executable models that capture complex dynamic behavior of reactive systems. A statechart describes an abstract machine that defines a set of existence conditions, called states, a set of behaviors or actions that can be performed in each of those states, and a set of events that may cause state transitions according to a set of well-defined rules. A statechart describes a model element in isolation in terms of its interaction with the rest of the world by responding to certain events. A response of an object to an event, and the action that may ensue as a result depend on the current state of the object and the event that occurs. This may possibly result in performance of an action and a transition into another state. An event may cause a firing of a transition, and execution of a sequence of actions associated with the transition. When the object modelled by the state machine is in a given state, it reacts only to certain events by performing the corresponding actions, and may transform into a subset of the set of states. UML statecharts are object-oriented variants of the classical statecharts first conceived by Harel et al [7]. The main difference between the UML statecharts and the classical ones is that the former specifies behavior of types whereas the latter specifies behavior of processes. In fact, the notion of process is not supported in the UML. The classical statecharts assume zero-time transition, whereas a transition may take some time in the UML statecharts; events are not broadcasted in UML, but they may be sent to a set of objects. For a detailed comparison between UML statecharts and the classical statecharts, interested readers may refer to chapter 2 page 157 in the standard document of UML version 1.3 [13]. In the context of object-oriented modeling techniques, elements that can have dynamic states are objects. Objects have both structural and behavioral properties. Static structural aspects of objects are described by UML class diagrams, whereas behavioral aspects can be captured by statechart and interaction diagrams. A state machine is associated with a specific modeling element, usually an object or an interaction, and specifies complete dynamic behavior of the modeling element by describing its reaction to events. The associated modeling element determines the context of the state machine. A typical instance is the use of state machines to model the behavior of reactive objects by describing their complete life cycle. An example of a UML statechart diagram shown in Figure 1 specifies a complete life cycle of an account object. An account can be either in the debit or the credit state depending on the value of its attribute balance b. The banking system allows customers to withdraw a given amount of fund in debt, subject to fixed fee f, hence the introduction of the debit state of the account. When an object is in the debit state, deposit(a) is the only operation allowed. At junction p, a guard condition [a+b>0] 122 4. Semantics of UML Statecharts [b−a<0]/b=b−a else/b=b−a q debit withdraw(a) deposit(a) p else/b=b+a−f [b+a>0]/b=b+a−f credit deposit(a)/b=b+a Figure 1: UML statechart for an Account Class is evaluated to check the amount against the balance b. Note that the balance b is less than zero when the account is in the debit state, and hence the deposited amount must be compared to -b. If the guard condition [a+b>0] is true, the account is transformed into the credit state, otherwise it remains in the debit state. In either case, the balance is updated by computing b:=b+a-f, where f is some constant fee charged when the account is in debit state. When an account object is in the credit state, the deposit(a) event increases its balance by a, and leaves its state unchanged. An occurrence of a withdraw(a) event when the account is in credit state, may transform it into the debit state or leave it in the same state depending on the truth value of the guard condition [b-a<0] at junction q. In any case, the balance is updated with b:=b-a. 4 Semantics of UML Statecharts In this section, we provide semantic definitions for UML statecharts by transforming them into appropriate entities in the PVS specification language. We encode the abstract syntax of UML statecharts, and associated well-formedness requirements. Note that the PVS-SL is used as underlying semantic foundation and not as a description language and hence users are not expected to have an in-depth knowledge about neither the PVS-SL nor its proof system. We define semantic models for statecharts using bottom-up approach, i.e. starting with semantic definitions of basic model elements such as states, transitions, events and actions we provide semantic definition for statecharts as an appropriate composition of semantic definitions of its components. We treat the informal semantic descriptions provided in UML v1.3 standard document [13] as a requirement specification on which the formal semantic models will be based. Some constraints on UML models may involve dynamic information, e.g. the number of objects created could only be available during run time. 123 4.1 Abstract Syntax of UML Statecharts We specify a parameterized theory that defines a predicate on sets of elements of a type given as parameter of the theory. The predicate optional?() filters the empty set and singleton sets of elements of the type. optional[T : TYPE ] : THEORY BEGIN x, y : VAR T; s : VAR set[T] singleton?(s): bool= EXISTS(x:(s)): optional?(s): END optional FORALL (y:(s)): y=x bool= (empty?(s) OR singleton?(s)) Given a type T and a set s of elements of T, (s) denotes a subtype of T containing exactly the elements of s. For every type (class in the UML vocabulary) involved in optional multiplicity, a new theory is instantiated from the generic theory optional with the type as a parameter using the PVS construct known as theory abbreviation. For instance, for the type T, a theory optional[T] is defined as an instance of theory optional. The expression optionalT.optional? provides access to the predicate optional?. optionalS : THEORY = optional[T] s : (optionalT.optional?) 4.1 Abstract Syntax of UML Statecharts We begin by representation of the notions of model element, action, signal, and operation as uninterpreted types in the PVS specification language. The ModelElement is a root class from which every class in UML meta-model inherits. The details of these model elements are intentionally avoided since such details are irrelevant at the level of abstraction we are working. ModelElement : TYPE+ Action, Signal, Operation : TYPE FROM ModelElement Next, we discuss notions of states, transitions and statecharts, and formally represent them. States: A state is a specification of a snapshot of values of program variables or behavior of an object that satisfies some, usually implicit, invariant conditions. Objects of a given class that are in the same state have the same qualitative responses to an occurrence of the same event. That is, they react to events in the same way, and execute the same sequence of actions, and may undergo the same set of transitions, apart from non-determinism. A state vertex is an abstraction of a node in a statechart diagram. In the UML meta-model, state is a direct subclass of the class ModelElement and hence we represent it as a subtype of the type ModelElement. In general, a state vertex can be a source and 124 4.1 Abstract Syntax of UML Statecharts target of any number of transitions. In the record type State, the field asModelElement captures properties inherited from the superclass ModelElement. StateVertex : TYPE FROM ModelElement The class StateVertex can be specialized into the following four kinds of states: State, PseudoState, StubState, and SynchState. A synchronous state is used to synchronize concurrent regions of a state machine. Pseudo states are vertices in the state machine that are used to connect multiple transitions into a transition path. A stub state appears within a submachine to refer to the actual subvertex contained within the referenced state machine. A state may have an entry action - the first action that takes place when the state is entered, a set of internal transitions and associated actions, and an exit action - the last action that takes place when the state is exited. Usually, an event that does not enable a transition is discarded. However, it is sometimes useful to keep this event waiting until the next state. A set of events to which a state machine does not react while it is in a given state is described as a set of ”deferable” events - the field deferable captures a set of such events. Note that we declare variables only once and use them in the later sections. T: TYPE ; x, y: VAR T; s : VAR set[T] optionalAction : THEORY = optional[Action] State : TYPE = [# asStateVertex: entry: doActivity: exit: deferable: StateVertex, (optionalAction.optional?)), (optionalAction.optional?)), (optionalAction.optional?)), setof[Event]#] PseudoStateKind: TYPE= { initial,deepHist,join, shallowHist,fork,junction,choice} PseudoState: TYPE=[# asStateVertex: StateVertex, pseudoKind: PseudoStateKind #] StubState:TYPE= [# asStateVertex: StateVertex, refState: String #] SynchState:TYPE= [# asStateVertex: StateVertex, bound: nat #] The class State is further specialized into SimpleState, CompositeState, and FinalState which we represent as subtypes. A composite state can be concurrent or sequential. v : VAR StateVertex SimpleState : TYPE FROM State 125 4.1 Abstract Syntax of UML Statecharts FinalState : TYPE = {v | outgoing(v) = ∅} CompositeState : container : TYPE = [# asState : State, isConcurrent : bool, dsubstate : fin set[StateVertex] #] [StateVertex → CompositeState] The container function returns the smallest composite state, if any, that contains a state vertex. The field dsubstate captures the set of direct sub-states of a state. It is used to define the function subvertex(), which returns the set of all sub-states of a given composite state. The subvertexInc() returns the set of sub-states of a state including the state itself. When applied to the top state of a state machine, subvertexInc() returns the set of all state vertices in the state machine by recursive application of dsubstate() to the vertices. contains(v,cs): bool = CompositeState(cs) ∧ member(v, dsubstate(cs)) subvertex(cs): RECURSIVE setof[StateVertex]= S union(dsubstate(cs), v∈dsubstate(cs) subvertex(v)) MEASURE (LAMBDA cs: dsubstate(cs) 6= ∅) subvertexInc(cs): setof[StateVertex] = union({cs},subvertex(cs)) If an event is deferred in a given composite state, then it is deferred in any substate of that state. We add the axiom deferax given below to capture this notion. v,v0 : VAR StateVertex; cs: VAR CompositeState deferax: AXIOM (v∈subvertexInc(cs)) ⇒ (deferable(cs) ⊆ deferable(v)) Transitions: A transition in UML statecharts models a change in object behavior from one state to another state (not necessarily distinct) as a result of a response to a reception of an event. The set of transitions specifies a reaction of an object to events, or the action carried out by its methods in response to occurrence of the event. In other words, an object in a given state, called the source of transition, evolves into another state, called target state, when a specific event occurs and a guard condition is satisfied, and perform a sequence of actions. A transition in a statechart may be labelled by a string of the form e[c]/sa, which means that the occurrence of event e, when the guard condition c is true, triggers the firing of the transition, as a result of which the object performs sequence of actions sa. The UML standard [13] also allows triggerless transitions, known as completion transitions. They have implicit triggers, i.e. completion event, which are generated when all transitions, entry actions and activities in the currently active state are completed. 126 4.1 Abstract Syntax of UML Statecharts To define semantics of a transition, we need the types Event, Action, and Guard, and instances of the theory optional instantiated with these types. Then, the notion of transition is captured by a record type with appropriate set of fields. Event : Guard : TYPE FROM ModelElement TYPE = [# asModelElement: ModelElement, expression: BoolExpression #] optionalEvent : THEORY = optional[Event] optionalGuard : THEORY = optional[Guard] optionalAction : THEORY = optional[Action] Transition: TYPE = [# asModelElement: source: trigger: guard: effect: target: ModelElement, StateVertex, (optionalEvent.optional?), (optionalGuard.optional?), (optionalAction.optional?), StateVertex #] We define some operations that specify associations between states and transitions. The functions incoming() and outgoing() defined on StateVertex return, respectively, the set of transitions entering and leaving the vertex. A transition connects exactly one source state and one target state, which are retrieved by applying the accessor functions source and target respectively, to the transition record. incoming : outgoing : [StateVertex → setof[Transition]] [StateVertex → setof[Transition]] State Machines: A state machine can be described completely by a top state, i.e. a composite state at the root of the state containment hierarchy, and a set of transitions. Given the top state of a state machine and the set of its transitions, all the remaining states can be retrieved by traversing the state containment hierarchy starting at the top state. Application of the subvertexIncl() function described above to the top state of a state machine returns the set of all state vertices in the state machine. Semantics of a state machine is defined as a record type whose set of fields contain the top state vertex, and the set of transitions. Symbolically, StateMachine: context : TYPE = [# asaModelElement: top: transitions: context: ModelElement, StateVertex, setof[Transition] ModelElement] #] [StateMachine → Context] The function context() determines the model element whose behavior is captured by the state machine. A model element can be described by several state machines, 127 4.2 Well-formedness Requirements but a given state machine describes at most one model element. The specification of function context() ensures that this requirement is fulfilled. The SubmachineState defined below is a syntactical convenience that facilitates modularity and reuse, and is semantically equivalent to a composite state. It is a placeholder for a state machine that is referenced by another state machine. The submachine() function defined below determines the state machine for which a submachine state stands in a given composite state. The stateMachine() function returns the state machine to which a transition belongs. SubmachineState : TYPE FROM CompositeState submachine: [SubmachineState, CompositeState → StateMachine] stateMachine : [Transition → StateMachine] 4.2 Well-formedness Requirements In this section we formalize well-formedness requirements (WFRs) on some of the modeling elements described above. The well-formedness rules can be defined in the same theory as the model elements they constrain or in a separate theory and imported. We follow the latter option since this approach matches the informal descriptions given in the standard document of UML v1.3 [13]. The WFRs are labelled with the labels in the UML standard document [13] suffixed with the initial letter of the model element they constrain. For instance, ruleCS1 corresponds to the first well-formedness rule for composite state. s : v : ps: VAR State; VAR StateVertex; VAR PseudoState; c1 : m : t : VAR CompositeState VAR StateMachine VAR Transition WFRs of Composite States: The following WFRs apply to CompositeState. A composite state can contain at most one vertex of each of the pseudostates initial, deepHist, and shallowHist. ruleCS1(cs): bool= optional?({ps|ps ∈ subvertex(cs) ∧ pseudoKind(ps) = initial}) ∧ optional?({ps|ps ∈ subvertex(cs)∧(ps)=deepHist}) ∧ optional?({ps|ps ∈ subvertex(cs) ∧ PseudoKind(ps)=shallowHist}) A concurrent composite state must have at least two direct subvertices each of which is a composite state. ruleCS2(cs):bool = isConcurrent(cs) ⇒ ((ksubvertex(cs)k ≥ 2) ∧ (subvertex(cs) ⊆ CompositeState)) where k.k is a function that returns the cardinality of a set. A given state vertex can be a part of at most one composite state. 128 4.2 Well-formedness Requirements ruleCS3(v): bool = (v∈substate(cs) ∧ v∈substate(c1)) ⇒ cs = c1 WFRs of Transitions: A fork segment should not have guards or triggers: ruleT 1(t): bool= (PseudoState(source(t))∧PseudoKind(source(t))=fork)⇒ (guard(t)=∅ ∧ trigger(t)=∅) A join segment should not have guards or triggers. ruleT 2(t): bool= (PseudoState(target(t))∧pseudoKind(target(t))=join)⇒ (guard(t)=∅ ∧ trigger(t)=∅) A fork segment should always target a state: ruleT 3(t): bool= (stateMachine(t)6=∅ ∧ PseudoState(source(t)) ∧ PseudoKind(source(t))=fork) ⇒ State(target(t)) A join segment should always originate from a state: ruleT 4(t): bool= ((stateMachine(t) 6= ∅ ∧ PseudoState(target(t)) ∧ pseudoKind(target(t)) = join) ⇒ State(source(t)) Transitions outgoing from a pseudostates may not have a trigger: ruleT 5(t): bool = PseudoState(source(t))⇒ trigger(t) = ∅ Join segments should originate from orthogonal states: ruleT 6(t): bool= (PseudoState(target(t)) ∧ pseudoKind(target(t))=join) ⇒ isConcurrent(container(source(t))) Fork segments should target orthogonal states: ruleT 7(t): bool= (PseudoState(source(t)) ∧ pseudoKind(source(t))=fork) ⇒ isConcurrent(target(t)) An initial transition at the topmost level may have a trigger with the stereotype ”create”. An initial transition of a StateMachine modeling a behavioral feature has a CallEvent trigger associated with that BehavioralFeature. Apart from these cases, an initial transition never has a trigger: CallEvent : TYPE FROM Event stereotype : [ModelElement → ModelElement] ruleT 8(t): bool= (PseudoState(source(t))∧ kind(source(t))=initial) ⇒(trigger(t) = ∅ ∨(container(source(t)) = top(stateMachine(t)) ∧ name(stereotype(trigger(t))) = "create") ∨(BehavioralFeature(context(stateMachine(t))) ∧ CallEvent(trigger(t))∧ operation(trigger(t))=context(stateMachine(t)))) 129 4.3 Semantic Definitions WFRs of State Machines: A state machine is aggregated either within a classifier or a behavioral feature. The context of a state machine should be an object or a behavior as specified by the well-formedness requirement ruleSM 1 given below. ruleSM 1(m): bool= Classifier(context(m)) ∨ BehavioralFeature(context(m)) The following three expressions specify the facts that the top state of a state machine is always a composite state, the top state does not have a container state, and it cannot be the source of a transition. ruleSM 2(m): ruleSM 3(m): ruleSM 4(m): bool= CompositeState(top(m)) bool= container(top(m)) = ∅ bool= outgoing(top(m)) = ∅ If a state machine describes a behavioral feature, it contains no trigger of type CallEvent, apart from the trigger on the initial transition. ruleSM 5(m): 4.3 bool = BehavioralFeature(context(m)) ⇒ (∀ t: t∈transitions(m) ∧ NOT (PseudoState(source(t)) ∧ pseudoKind(source(t)) = initial) ⇒ trigger(t) = ∅) Semantic Definitions Once the abstract syntax of basic elements of UML state machines, and well-formedness requirements are precisely encoded in the PVS specification language, providing semantic definitions for more complex model elements is easier. Formalizing semantic concepts of UML state machines paves a way for specifying important properties exhibited by the system and for rigorous reasoning about their correctness. In general, for a UML model M, whose abstract syntax is encoded in the PVS-SL as SyntaxM and its weel-formedness requirements as predicates ruleM1, ..., ruleMk, its semantics SemM is the predicate subtype of SyntaxM with respect to the conjunction of its well-formedness predicates. For instance, semantics of the state machine is defined as follows: SemStateMachine : TYPE = {m| ruleSM1(m) ∧ ...∧ ruleSM5(m)} A state is said to be active when it is entered as a result of transition and becomes inactive when it is exited. A state can be thought of as a predicate on the set of program variables. The state is active when this predicate returns value true. For a composite state that is active, and non-concurrent, exactly one of its substates is active. If a composite state is active and concurrent, then all of its substates are active. 130 5. Conclusion active: [StateVertex → bool] activeAx1: AXIOM (active(c) ∧ NOT isConcurrent(c) ∧ v∈subvertex(c)) ⇒ k{v:(dsubstate(c))|active(v)}k = 1 activeAx2: AXIOM (active(c) ∧ isConcurrent(c)∧ v ∈subvertex(c)) ⇒ (FORALL (v:(dsubstate(c))): active(v)) If a give simple state is active, then every composite state containing the state, directly or transitively, is also active. Since some of the composite states may be concurrent, a current active state is represented by a tree of states, called state configuration, starting with the top most composite state down to individual simple states at the leaves. configuration : [StateMachine → setof[State]] configuration(sm) = {s| s∈subvertexInc(top(sm)) ∧ active(s)} More advanced semantic concepts such as conflicting transitions, firing priorities, etc. can similarly be formalized in terms of the basic concepts of UML statecharts defined above. 5 Conclusion We have proposed semantic definitions for UML statecharts using the PVS specification language as underlying semantic foundation. The main objective of the work is to give a precise and equivocal description of the UML statecharts. Such a precise description is required as a reference model for implementing tools for code generation, simulation and verification of UML statecharts. The framework integrates a UML CASE tool and the PVS toolkit resulting in heterogeneous platform that combines the strengths of a semi-formal graphical modeling notation and a formal verification environment. Other benefits of transforming the UML statecharts into the PVS-SL include the ability to produce precise and analyzable specifications, and the availability of PVS toolkit that supports rigorous reasoning about the resulting semantic models. Several semantics for statecharts have been proposed in the literature [7, 6, 9, 17]. Most of them are concerned with defining semantics of the classical Harel’s statecharts [7]. For instance, Harel et al [7, 6] present semantics of classical statecharts in the STATEMATE system. Mikk et al [11] propose formal semantics of UML statecharts based on hierarchical automata. The representation in hierarchical automata is not suitable for tool development [10]. It does not directly support transition across compound states, and the hierarchical structure must be flattened before using it in a model checker. The work presented in the sequel is similar to the work presented in [17], yet this work is more detailed. 131 References This work contributes to the ongoing effort to provide formal standard semantic definitions for UML notations, with the aim of clarifying and disambiguating the language as well as supporting the development of semantically based tools. It is a part of our long-term vision to explore how the PVS tool set could be used to underpin practical CASE tools to analyze UML models. Acknowledgements The author is grateful to Olaf Owe, Wenhui Zhang, and Issa Traoré for their invaluable comments. This work was funded by the Research Council of Norway through the ADAPT-FT project. References [1] J.-M. Bruel and Robert B. France. Transforming UML Models to Formal Specifications. In the Proc. of the OOPSLA’98 Workshop on Formalizing UML. Why? How?, Vancouver, Canada, October 1998. [2] J. Crow, S. Owre, J. Rushby, N. Shankar, and M. Srivas. A Tutorial Introduction to PVS. In WIFT’95: Workshop on Industrial-Strength Formal Specification Techniques, Boca Raton, Florida, USA, April 1995. [3] A. Evans. Reasoning with UML Class Diagrams. In the Proc. of WIFT’98. IEEE Press, 1998. [4] R. B. France, J.-M. Bruel, and M. M. Larrondo-Petrie. An Integrated Object-Oriented and Formal Modeling Environment. Journal of Object-Oriented Programming (JOOP), 10(7), December 1997. [5] R. B. France, A. Evans, K. Lano, and B. Rumpe. The UML as a Formal Modeling Notation. Computer Standards & Interfaces, 19:325–334, 1998. [6] D. Harel and A. Naamad. The STATEMATE Semantics of Statecharts. ACM Transactions on Software Engineering and Methodology, 5(4):293–333, October 1996. [7] D. Harel, A. Penueli, J. P. Schmidt, and R. Sherman. On the Formal Semantics of Statecharts. In the Proc. of the 2nd IEEE Symposium on Logic in Computer Science, pages 54–64, New York, USA, 1987. IEEE Press. [8] P. Krishnan. Consistency Checks for UML. In the Proc. of the Asia Pacific Software Engineering Conference (APSEC 2000), pages 162–169, December 2000. [9] D. Latella, I. Majzik, and M. Massink. Towards a Formal Operational Semantics of UML Statechart Diagrams. In the Proc. of FMOODS’99, Florence, Italy. Kluwer, February 15-18, 1999. [10] J. Lilius and I. P. Paltor. The Semantics of UML State Machines. Technical Report No. 273, May 1999. Turku Centre for Computer Science, Finland. [11] E. Mikk, Y. Lakhnech, and M. Siegel. Hierarchical Automata as Model for Statecharts. In K. Ueda R. K. Shyamasundar, editor, the Proc. of Asian Computing Science Conference (ASIAN’97), volume 1345 of LNCS, pages 181–196. Springer Verlag, December 9-11 1997. [12] A. Moreira and R. Clark. Combining Object-oriented Analysis and Formal Description Techniques. In the Proc. of ECCOP’94, LNCS, volume 821, Bologna, Italy, 1994. Springer-Verlag. [13] The OMG. OMG Unified Modeling Language Specification, version 1.3, June 1999. standard document. 132 OMG References [14] S. Owre, J. Rushby, N. Shankar, and F.V. Henke. Formal Verification for Fault-tolerant Architectures: Prolegomena to the design of PVS. IEEE Trans. On Soft. Eng., 21(2):107–125, February 1995. [15] S. Owre, N. Shankar, J. Rushby, and D. W. Stringer-Calvert. PVS System Guide, version 2.3, September 1999. [16] M. Shroff and R. B. France. Towards a formalization of UML Class Structures in Z. In the Proc. of the COMPSAC’97, 1997. [17] I. Traore. An Outline of PVS Semantics for UML Statecharts. Jounal of Universal Computer Science, 6(11):1088–1108, 2000. [18] J. B. Warmer and A. G. Kleppe. The Object Constraint Language: Precise Modeling with UML. Addison Wesley Longman Inc., 1999. 133 134 Appendix F Tracking Inconsistencies in an Integrated Platform I. Traoré, D. B. Aredo and K. Stølen Publication: I. Traoré, D. B. Aredo and K. Stølen: Tracking Inconsistencies in an Integrated Platform, Research Report 274, Department of Informatics, University of Oslo, Norway, August 1999. Tracking Inconsistencies in Integrated Platforms I. Traoré, D. B. Aredo and K. Stølen Department of Informatics, University of Oslo P. O. Box 1080 Blindern, N-0316 Oslo, Norway {issat,demissie,ketils}@ifi.uio.no Abstract A response to the increasing complexity of contemporary systems is the use of integrated platforms for their development. Integrated platforms may involve different technologies and methodologies, that may lead unavoidably to inconsistencies. Tracking inconsistencies in such environments remains still an open issue, especially when we are working with different formalisms. In this paper, we introduce an approach to deal with such kinds of inconsistencies, based on semantic equivalence between constructs in the different languages involved. We present a case study involving two specification formalisms, namely UML and OUN. Keywords: complex systems, consistency checking, requirement, specification, integrated platform, UML, OUN 1 Introduction Late decades have experienced the widespread use of software application; several tasks, which used to be performed manually, are currently carried out using software. For instance, in the aeronautics industry, an evidence of this fact is the increasing amount of avionics, which represents currently, about 30% of the cost of an aircraft [Cas94]. Another instance can be found in the telecommunication industry, where the incremental feature-by-feature extension of systems’ functionality has led to the problem of feature-interaction [JZ98]. The consequence of this situation is the fact that actual software systems have reached unmanageable size and complexity [GJM91]. Hence the development process involves several participants, uses different technologies and methodologies, unavoidably resulting in conflicts and inconsistencies, one of the major sources of errors [NKF94]. In order to improve the quality and productivity of software development, it is important to find a means to handle these inconsistencies, 135 1. Introduction especially at the earlier phases, where fixing an error is by far cheaper than at later phases. There are various kinds and sources of inconsistencies. Development processes may be inconsistent by involving contradictory activities; software artifacts may be inconsistent by containing contradictory requirements. Inconsistencies may arise during requirement engineering, at design level and during programming [GN98]. Inconsistencies may also arise between different phases of the development process: between requirements and design, between design and implementation etc. [ECW98]. But even if it is important to detect inconsistencies, their removal should depend on the context. Sometimes, a removal of certain inconsistencies results in new ones; sometimes it is better to find ways to live with inconsistencies and postpone their removal [HN97]. A systematic removal may constrain the development process unnecessarily [FGH+ 93]. Considerable results have been achieved in research on consistency checking within a single formalism [HJL96, HL96]. This is based mainly on syntax and semantic checks and some additional checks specific to the considered modeling scheme, in order to achieve what is broadly considered as internal consistency. The most difficult question remains when we are dealing with inconsistencies across language boarders in a platform that uses different languages [BDS96, GHM98]. One reason for this is the confusion about the actual meaning of inconsistency: there are several definitions in the literature and there is no agreement among researchers. According to [BDS96], up to three interpretations of inconsistency can be drawn from the RM-ODP [JTC95]. Another reason relies on the fact that there are several kinds of inconsistencies, nine different kinds are identified in [LDL98]. This diversity of inconsistencies appeals, in fact, to the definition of different approaches, each dealing with specific kinds of inconsistencies. Such approaches should exhibit at least the following four characteristics: • existence of a solid theoretical basis in order to allow rigorous reasoning. • support for automation in order to facilitate industrial use. • applicability to a wide range of formalisms. • extensibility in order to ease the evolution of the platform in which they may be involved. In this case, the previous syntactic and semantic schemes used for internal consistency doesn’t work since we are dealing with syntactic and semantic entities belonging to different formalisms. The subject matter in this setting is the contradiction that may arise from the representation of the same knowledge within different modeling schemes. We believe that finding out why different representations of the same knowledge may yield contradictory meanings should be possible by analyzing the interactions occurring among the formalisms involved. In this paper, we propose an approach to track 136 2. Presentation of Our Approach inconsistencies by analysing the interactions among the formalisms involved. This approach is based on the decomposition style adopted in the integrated platform, that is a codification of how concerns are separated and how languages are built on one another. The rest of the paper is organized as follows. In Section 2, we present our understanding of the concept of inconsistency, and at the same time, introduce our approach. Then, in Section 3 we present a platform that integrates two specification formalisms: the Unified Modeling Language (UML) [OMG99, BRJ99] and the Oslo University Notation (OUN) [OR99]. In Section 4, a consistency checking scheme is presented. In Section 5, we discuss a case study based on the requirements of a mobile telephone system. Finally, in Section 6 we make some concluding remarks. 2 Presentation of Our Approach 2.1 Context As we mentioned in the introduction, there are different categories of inconsistencies and different criteria can be used to identify them. From our experience in dealing with integrated frameworks, we know that there are two criteria which cover most of the inconsistencies: classification with respect to the stages of development and the formalisms involved. Based on these criteria, given a pair of specification languages integrated in a system development, we consider three classes of inconsistencies. Namely, inconsistencies: 1. between different phases of development (either in the same language or in different languages); this should be dealt with in correlation with refinement. 2. in the same language and at the same phase of development; this is equivalent to the case of internal consistency. 3. between different languages and at the same phase of development; this is one of the most challenging issues. Our work focuses on the last kind of inconsistencies between specification given in UML and OUL notations. 2.2 Outline of our Approach For two specification languages L1 and L2, we represent the types of consistency handled in the sequel by a relation C ⊆ SynL1 × SynL2 that must hold between pairs of specifications developed by using the languages. SynL denotes the syntactic domain associated with a language L. Relation C is defined during the design of the integrated platform. 137 3. A Platform Involving two Notations: UML and OUN In this approach, we assume that internal consistency is already achieved within each formalism. We base our work on analysis of interactions among different formalisms by relating constructs, which are semantically equivalent in each formalism. Specifically, we define the relation C by providing an abstract syntax and a set of definitions that describe how specific pairs of constructs are related. In some cases, semantic equivalence between constructs in different specification languages is straightforward, but in other cases it requires some adaptation or it can be obtained by defining specific conditions. The analysis of the interactions occurring in a specific platform should take into account the decomposition style adopted. A decomposition style determines precisely which specification languages are used, which system properties are specified in each language, and how specifications interact across language boundaries. 2.2.1 Generalization: So far we have presented the case of two formalisms. However, the generalization of our approach to more than two formalisms is straightforward. To this end, given a platform involving languages L1 , ..., Ln , (n ≥ 2) , we define C as a boolean function: C : SynL1 × ... × SynLn → Bool which yields true if the specifications developed in this platform are pairwise consistent. For each language Lj , 1 ≤ j ≤ n, we provide an abstract syntax. For each pair of language (Li , Lj ), i 6= j, we define a semantic equivalence relation Cij in the same way as the relation C is provided for n = 2. Function C will be defined by combining the small definitions provided by the relations Cij : ^ C(Spec1 , ..., Specn ) ⇔ Cij (Speci , Specj ) 1≤i,j≤n,i6=j 2.2.2 Automation and extensibility: The structure of our approach facilitates automation and extension. Most of the properties are algorithmically decidable, and for others that are not, theorem proving may be required. The automation of this approach may consist of three different tools: an automatic consistency-checker, which carries out algorithmic checking, a proof-generator augmented by a theorem-prover for undecidable cases. 3 A Platform Involving two Notations: UML and OUN We integrate UML and OUN in a platform dedicated to formal description of open distributed systems [TS99]. The aim of the platform is to put together various capabilities 138 3.1 UML of the formalisms and modeling languages, like user friendliness and communicability for an easy use in industrial settings, the ability to support major aspects of open distributed systems such as openness and dynamic reconfiguration, and the support for formal reasoning. UML is an object-oriented language based on graphical notations. OUN is an object-oriented formal method targeted towards formal development of open distributed systems. The integration of UML and OUN is built on a common semantic basis provided by PVS Specification Language (PVS-SL) [ORSH95, OSRSC99]. Though the proof system of PVS provides support for formal reasoning, the user will not need to have an in-depth knowledge of the PVS formal system, since PVS is used in this platform as a semantics foundation and not as a specification language. 3.1 UML The UML is mainly based on a graphical notation, which consists of static structures such as class diagrams and dynamic behaviors, such as use case, interaction diagrams, statecharts, and implementation diagrams: • use cases and actors define the boundary of a system and its major functionalities; • interaction diagrams illustrate realizations of use cases; • class diagrams describe static structure of systems; • state transition diagrams model behavior of objects; • component diagrams illustrate the organization of the system and dependencies among software components; • deployment diagrams show distribution of components across the enterprise. A class diagram consists of a set of classes and interfaces, and relationships among them. There are different kinds of relationships: association (a bi-directional connection between classes), aggregation (a relationship between a whole and its parts), inheritance (generalization/specialization), realization (between class and interface) etc. A UML interaction diagram commonly contains objects, links among objects, and messages they communicate. 3.2 The Oslo University Notation (OUN) A requirement specification in OUN is given in terms of interfaces and contracts. It is a form of rely-guarantee specification, which may include assumptions and invariants about the environment [OR99]. Classes may appear later, during design specification, and may contain the definition of the attributes and the implementation of operations. The following are major concepts in OUN: 139 3.3 Decomposition Style Adopted Objects with internal activities and structure. Interfaces with syntactic and semantic specification of methods. Classes with state variables and imperative style implementation. Contracts specify the interaction between two or more objects. All these concepts are specified by historic information: finite or infinite sequences of parameterized events that describe interactions between an object and its environment. Consequently, only externally visible information such as its signature and operation invocations, are considered. An object is typed by an interface in contrary to UML where it is typed by a class. Objects can be created dynamically and can implement several interfaces. Multiple inheritance of interfaces and classes, or dynamic addition of interfaces and methods into classes is supported. 3.3 Decomposition Style Adopted The philosophy behind our decomposition style is to exploit efficiently the synergy between both formalisms. This should take into account their specific strengths and their complementary features. In OUN, requirement specification is given in terms of interface and contract; there is no class concept at that level in contrast to UML. The concept of class appears in OUN later during design specification. In this respect, we propose a decomposition style whose main steps are shown in Figure 1. The process begins by providing a graphical specification of user requirements using UML modeling techniques. This consists of capturing user needs by defining use cases and corresponding interaction diagrams. It also includes class diagrams that define the structure of the system, and component and deployment diagrams that describe the system architecture. The next step consists of refining the UML specification UML Spec1; all the components of the original specification are preserved, except classes. Classes are modified as follows: each class is refined as a pair of a class and an interface. The refined class will keep the name, the attributes and non-public operations of the original class while the interface will consist of operations, which are public. Then, from this refined version of UML class diagrams, labelled UML Spec2, we derive a complementary OUN specification, OUN Spec1. OUN complements UML by describing the invariants and constraints attached to the main constructs of UML such as types, classes, and interfaces. From a UML class diagram, we derive the OUN requirement specification, OUN Spec1, as follows: • each interface in the UML class diagram is redefined as an interface in the OUN specification, with the same name and signatures of operations; 140 4. Consistency-Checking Scheme Requ. specification UML Spec1 refinement 1 UML Spec2 derivation 1 OUN Spec1 derivation 2 refinement 2 OUN spec2 refinement 3 ... Figure 1: Development Process • generalization relationships among interfaces are preserved. The OUN requirement specification obtained at the end of this step will serve as basis for design activities, which are performed within this formalism. Our first design product, OUN Spec2, is obtained by augmenting the OUN requirement specification, with additional information derived from the refined UML class diagram produced previously. This additional information is obtained as follows: each UML class, generalization and realization is redefined correspondingly in the OUN model. Hence, the augmented specification OUN Spec2 is a refined version of OUN Spec1. From the interaction diagrams, we may identify the objects and events involved. 4 4.1 Consistency-Checking Scheme Decomposition Style Revisited Analysis of the decomposition statement highlights two kinds of properties that should be enforced: syntactic and semantic consistencies. Syntactic consistency in this setting ensures that some specific constructs of the UML specification such as class, interface and generalization, are uniquely and consistently redefined in terms of OUN constructs. Semantic consistency ensures that a knowledge shared by both models yield the same meaning. This includes, for instance, checking that the invariant and assumption defined for an OUN interface should hold for an instance corresponding to a UML 141 4.2 Abstract Syntax Definition object identified in an interaction diagram. Another aspect of the decomposition style is the different steps involved (see Figure 1), which appeal to different kinds of checks. There are at least two refinement steps, from UML Spec1 to UML Spec2, and from OUN Spec1 to OUN Spec2. Our consistency scheme is concerned mainly with the derivation from UML Spec2 to OUN Spec1 and from UML Spec2 to OUN Spec2, and hence takes the form of specific relations valid for each step. 4.2 Abstract Syntax Definition We give an abstract syntax for UML and OUN constructs using on a variant of BNF [Nau60]. Curl brackets are used to indicate a set of items, possibly empty, whereas square brackets denote sequences, possibly empty. We put emphasis on the definition of constructs, which are relevant to our consistency checking scheme, and we give details only when it is necessary. We give the following definitions: 4.2.1 UML specification A UML specification may consist of several kinds of diagrams among which the most relevant to this work are class diagrams, and interaction diagrams. Specuml ::= {Class diagram|Interaction diagrams|Other diagrams} Class diagram ::= classes interf aces generalizations Others Interaction diagram ::= objects traces A class diagram consists of a set of classes, a set of interfaces, a set of generalization relationships and several other kinds of constructs (not relevant in this context). An interaction diagram can be represented by a set of objects, and a set of traces of event describing possible sequences of interactions among the objects. We consider two kinds of generalization: generalization among interfaces and generalization involving classes. classes ::= {classuml } interf aces ::= {interf aceuml } generalizations ::= generalizationsintf | generalizationscl generalizationsintf ::= {generalizationuml intf } generalizationscl ::= {generalizationuml cl } objects ::= {objectuml } traces ::= {trace} trace ::= [event] 142 4.2 Abstract Syntax Definition We represent a class by its name, set of attributes, operations and interfaces. An interface is represented by its name and set of operations. classuml ::= name attributes operations interf aces interf aceuml ::= name operations attributes ::= {attribute} operations ::= {operation} Class generalizations are represented by two sets of classes representing respectively the superclass(es) and the subclasses involved. generalizationomguml1.3 cl ::= Supcl Subcl Supcl ::= {classuml } Subcl ::= {classuml } We define interface generalization analogously: generalizationuml intf ::= Supintf Subintf Supintf ::= {interf aceuml } Subintf ::= {interf aceuml } An object is represented by its name, its class and its set of possible traces. objectuml ::= name class traces 4.2.2 OUN specification An OUN specification may consist of one of two kinds of components. The first component, labelled here as Specif, is provided at the requirement specification level and consists of a set of contracts, a set of interfaces and a set of generalizations among these interfaces. The second component, labelled Implem, is provided during design specification. It consists of the same items as Specif, augmented possibly by a set of classes and a set of class generalizations. A contract is a kind of glass-box specification, which restricts the interactions among several objects and enable us to express more global properties [OR99]. An example of contract is given in appendix A.2. Specoun ::= Specif | Implem Specif ::= interf aces generalizationsintf contracts Implem ::= Specif classes generalizationscl interf aces ::= {interf aceoun } contracts ::= {contract} generalizations ::= generalizationsintf | generalizationscl generalizationsintf ::= {generalizationoun intf } generalizationscl ::= {generalizationoun cl } classes ::= {classoun } 143 4.3 Definition of a Consistency Relation We represent an OUN class or interface by the same elements as the corresponding constructs in UML, with two additional fields, one for the invariant and the other for the assumption involved. An invariant asserts properties that each object that provides the interface must satisfy, and an assumption describe minimal context requirements. Thus, assuming that the conditions described by the assumption hold, the invariant should always be true for any object of the corresponding interface. Each object has an implicit local variable, which represents its history, i.e. the sequence of the method calls involving the object since its creation. Assumptions and invariants are expressed as predicates on the communication history. classoun ::= name attributes operations interf aces assumption invariant interf aceoun ::= name operations assumption invariant A contract is represented by the set of interfaces involved, and an invariant. We represent generalizations similarly as in the UML syntax. contract ::= interf aces invariant generalizationoun intf ::= Subintf Supintf Supintf ::= {interf aceoun } Subintf ::= {interf aceoun } generalizationoun cl ::= Supcl Subcl Supcl ::= {classoun } Subcl ::= {classoun } Since an object is typed by an interface in OUN, we represent an object by its name and interface. objectoun ::= name interf ace 4.3 Definition of a Consistency Relation We provide an inductive definition of a consistency relation, say C, consisting of definitions based on semantic equivalence between the various UML and OUN constructs and the rules underlying the decomposition style adopted. 4.3.1 Mapping an Interface An interface in UML class diagram is redefined as an OUN interface with the same name and set of operations. ∀ i : interf aceuml , i0 : interf aceoun • C(i , i0 ) ⇔ (i.name = i0 .name) ∧ (i.operations = i0 .operations) 144 4.3 Definition of a Consistency Relation 4.3.2 Mapping a Class A class in UML class diagram is redefined as an OUN class with the same name, and a set of attributes and operations that include the set of attributes and operations of the corresponding UML class (possibility of class extension in OUN is taken into account). Additionally, each interface implemented by the UML class should be related to an interface implemented by the OUN class. ∀ c : classuml , c0 : classoun • C(c , c0 ) ⇔ (c.name = c0 .name)∧ (c.attributes ⊆ c0 .attributes)∧ (c.operations ⊆ c0 .operations)∧ (∀i ∈ c.interf aces, ∃!i0 : i0 ∈ c0 .interf aces • C(i, i0 )) In the above definition, attributes, operations, and interfaces of a class also include those inherited from its parent classes. 4.3.3 Mapping an Object A UML object is mapped to an OUN object having the same name, and whose interface should be related to a UML interface implemented by the UML object. ∀ o : objectuml , o0 : objectoun • C(o , o0 ) ⇔ (o.name = o0 .name)∧ (∃i : i ∈ o.class.interf aces • C(i, o0 .interf ace)) 4.3.4 Mapping generalization relationships: A UML generalization is mapped to an OUN generalization if the elements of the UML superclass (respectively subclass) can be related bijectively to the elements of the OUN superclass (respectively subclass). ∀ G : generalizationuml , G0 : generalizationoun • C(G , G0 ) ⇔ (∀c ∈ G.Sup, ∃!c0 ∈ G0 .Sup • C(c, c0 ))∧ (∀c ∈ G.Sub, ∃!c0 ∈ G0 .Sub • C(c, c0 ))∧ (#G.Sup = #G0 .Sup)∧ (#G.Sub = #G0 .Sub) The operator # is used to return both the length of a sequence and the cardinality of a set. 145 4.3 Definition of a Consistency Relation 4.3.5 Mapping a class diagram A class diagram is related to the kind of OUN specification denoted by Specif, if each UML interface or interface generalization can be related uniquely to corresponding items in Specif. ∀ Cd : Class diagram, Sp : Specif • C(Cd , Sp) ⇔ (∀i ∈ Cd.interf aces, ∃!i0 : i0 ∈ Sp.interf aces • C(i, i0 ))∧ (∀g ∈ Cd.generalizationsintf , ∃!g 0 : g 0 ∈ Sp.generalizationsintf • C(g, g 0 )) A class diagram is related to the kind of OUN specification denoted by Implem, if it is related to the Specif component of Implem, and if all the UML classes and class generalizations are uniquely related to corresponding items in Implem. ∀ Cd : Class diagram, Im : Implem• C(Cd , Im) ⇔ C(Cd, Im.Specif )∧ (∀c ∈ Cd.Class, ∃!c0 : c0 ∈ Im.classes • C(c, c0 ))∧ (∀g ∈ Cd.generalizationscl , ∃!g 0 : g 0 ∈ Im.generalizationscl • C(g, g 0 )) 4.3.6 Mapping interaction diagrams: We can relate interaction diagrams to different kinds of constructs in OUN, the objective being to capture some semantic concepts. In this work, we provide three such definitions. The first definition is as follows: ∀ ids : P(Interaction diagram), Intf : P(interf aceoun )• C(ids , Intf ) ⇔ (∀Id ∈ ids, o ∈ Id.objects, F ∈ o.class.interf aces, G ∈ Intf • C(F , G) ⇒ (∀H ∈ Id.traces/o, ∃Ho ∈ o.traces• (H in Ho )∧ V ( P →G P.assumption(Ho ) ⇒ P.invariant(Ho )))) where P denotes the powerset operator. A set of interaction diagrams is consistently related to a set of OUN interfaces if for each object involved in an interaction diagram, we can find a corresponding OUN object for which the invariants and assumptions on related interface hold. The “p in q ” operation on sequences of events defines that the sequence p occurs consecutively in sequence q. We also use the projection operator 146 4.3 Definition of a Consistency Relation denoted by “/”. H/o, also denoted by Ho , represents the projection of history H V onto the set of method calls involving object o. P →G denotes the conjunction of the assumption/guarantee pairs related to any super-interface P of interface G or to G itself. The with clause used in the definition of an interface F, asserts that only interfaces listed in the clause may interact with objects of F through the listed operations (see appendix A.2 for an example). The projection H/F of the history onto interface F is the projection of H onto the set of methods defined in F and in the interfaces appearing in the with clause of F and of its possible super-interfaces. We denote by H/F o, the projection of the history onto the set of methods defined in interface F and received by object o, or defined in the interfaces appearing in the with clause of G and called by o. The second definition relates a set of interaction diagrams to a set of OUN classes, if for each object involved in the interaction diagrams, a corresponding OUN object will respect the invariant and assumption on corresponding OUN class. ∀ ids : P(Interaction diagram), Cl : P(classoun )• C(ids , Cl) ⇔ (∀Id ∈ ids, o ∈ Id.objects, G ∈ Cl• C(o.class , G) ⇒ (∀H ∈ Id.traces/o, ∃Ho ∈ o.traces• (H in Ho )∧ V ( P →G P.assumption(Ho ) ⇒ P.invariant(Ho )))) . The third definition relates a set of interaction diagrams to a contract. Given an interaction diagram in the set, and a set of objects involved in this interaction diagram, if the related OUN objects are involved in a contract, the invariant of the contract should hold. ∀ ids : P(Interaction diagram), C : contract, • C(ids , C) ⇔ (∀ Id ∈ ids, H ∈ Id.traces, ∃Hc : trace• S (H/( F i∈C.interf aces F i) in Hc )∧ C.invariant(Hc )). Hence, we provide the following definitions, which relate interaction diagrams with the different kinds of OUN specifications: Specif component (including OUN interfaces and contracts) and Implem component. ∀ ids : P(interaction diagram), Sp : Specif • 147 5. Case Study C(ids , Sp) ⇔ C(ids, Sp.interf aces)∧ (∀C ∈ Sp.contracts • C(ids, C)) ∀ ids : P(interaction diagram), Im : Implem• C(ids , Im) ⇔ C(ids, Im.Specif )∧ C(ids, Im.Class) 4.3.7 Consistency relation: On the basis of the previous definitions, we provide the general definition of our consistency relations as follows: ∀ Spec1 : Specuml , Spec2 : Specoun • C(Spec1 , Spec2) ⇔ C(Spec1.Class diagram, Spec2)∧ C(Spec1.Interaction diagrams, Spec2) 5 Case Study We have developed a case study dealing with a mobile phones network adapted from [OP92]. The objective was to check the definitions provided for C (see section 4.3.7). The definitions related to syntactic consistency were checked algorithmically. Abstract syntax of both UML and OUN specifications were provided, and processed in order to check incomplete or missing cases. The definitions concerning semantic consistency, were undecidable, and required the generation of corresponding proof obligations. An overview of the case study and some of the proof obligations generated is given in the appendix. 6 Conclusion The approach we have introduced meets all the requirements that are outlined in the introduction. Some of the checks involved may seem simplistic or trivial. But we must keep in mind that the kinds of errors to which they are targeted, that is missing cases and misconceptions, represent undoubtedly some of the most frequent source of errors when we are dealing with large specifications. The kinds of tools proposed are useful in this context since they may help developers to keep track of all the details in a consistent way. Another characteristic of our approach is that it represents a preliminary step before undertaking general validation activities, which may be more complex. For instance, formulas such as the one related to assumptions and invariants, are checked in particular 148 References cases. This is useful before undertaking the general proof covering the whole history, since this may be time consuming and more complex. Another important aspect of the approach is the automation. In the particular case presented in section 3, we are developing a supporting environment, called Integrator [TS99], which encompasses all functionalities from requirements capture to code generation. The Integrator includes specific components for verification and validation, which consist of a parser and a type checker for each language, a consistency checker, an animator, a proof generator and a theorem prover. Type checking and theorem proving are based on the facilities provided by the PVS toolkit. References [BDS96] [BRJ99] [Cas94] [ECW98] [FGH+ 93] [GHM98] [GJM91] [GN98] [HJL96] [HL96] [HN97] [JTC95] [JZ98] [LDL98] H. Bowman, J. Derrick, and M.W.A. Steen. Viewpoint Consistency in ODP, a general interpretation. In E. Najm and J.-B. Stefani, editors, the Proc. of 1st IFIP International Workshop on Formal Methods for Open Object-Based Distributed Systems, pages 189–204. Chapman & Hall, March 1996. G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison Wesley Longman Inc, Reading Massachusetts 01867, 1999. V. Cassigneul. How to Control the Increase in Complexity of Civil Aircraft On-board Systems, 1994. AEROSPATIALE Aircraft, Internal Report. S. Easterbrook, J. Callahan, and V. Wiels. V&V Through Inconsistency Tracking and Analysis. In the Proc. of International Workshop on Software Specification and Design, Ise-Shima, Japan, April 16-18 1998. A. Finkelstein, D. Gabbay, A. Hunter, J. Kramer, and B. Nuseibeh. Inconsistency Handling in Multi-Perspectives Specifications. In the Proc. of 4th European Software Engineering Conference (ESEC’93): LNCS 717, pages 84–99, Garmisch-Partenkirchen, Germany, September 1993. Springer-Verlag. J. Grundy, J. Hosking, and W. B. Mugridge. Inconsistency Management for MultipleView Software Development Environments. IEEE Trans. On Soft. Eng., 24(10), October 1998. C. Ghezzi, M. Jazayeri, and D. Mandrioli. Fundamentals of Software Engineering. Prentice-Hall International, 1991. C. Ghezzi and B. Nuseibeh. Managing Inconsistency in Software Development. IEEE Trans. On Soft. Eng., 24(10), November 1998. Introduction To The Special Section. C. L. Heitmeyer, R.D. Jeffords, and B.G. Labaw. Automated Consistency Checking of Requirements Specifications. ACM Trans. on Software Engineering and Methodology, 5(3):231–261, July 1996. M. Heimdahl and N. Leveson. Completeness and Consistency Analysis of State-Based Requirements. IEEE Trans. On Software Engineering, 22:363–377, November 1996. A. Hunter and B. Nuseibeh. Analyzing Inconsistent Specifications. In the Proc. RE’97, 3rd Int’l Symp. Req. Eng., pages 78–86, Annapolis, Md., 1997. ISO-IEC JTC1/SC21/WG7. Reference Model of Open Distributed Processing (RMODP), 1995. M. Jackson and P. Zave. Distributed Feature Composition: A Virtual Architecture for Telecommunications Services. IEEE Trans. On Soft. Eng., 24(10), October 1998. A. V. Lamsweerde, R. Darimont, and E. Letier. Managing Conflicts in Goal-Driven Requirements Engineering. IEEE Trans. On Soft. Eng., 24(10), October 1998. 149 [Nau60] [NKF94] [OMG99] [OP92] [OR99] [ORSH95] P. Naur. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, pages 299–314, May 1960. B. Nuseibeh, J. Kramer, and A. Finkelstein. A Framework for Expressing The Relationships between Multiple Views in Requirement Specification. IEEE Trans. On Soft. Eng., 20(10):760–773, October 1994. OMG. OMG Unified Modeling Language Specification, version 1.3, June 1999. OMG standard. F. Orava and J. Parrow. An Algebraic Verification of a Mobile Network. Journal of Formal Aspects of Computing, 4:497–543, 1992. O. Owe and I. Ryl. The Oslo University Notation: A Formalism for Open, ObjectOriented, Distributed Systems. Report No. 270, August 1999. Department of Informatics, University of Oslo, Norway. S. Owre, J. Rushby, N. Shankar, and F.V. Henke. Formal Verification for Fault-tolerant Architectures: Prolegomena to the design of PVS. IEEE Transactions On Software Engineering, 21(2):107–125, February 1995. [OSRSC99] S. Owre, N. Shankar, J. Rushby, and D. W. Stringer-Calvert. PVS System Guide, version 2.3. Computer Science Laboratory, SRI International, Melon Park, CA, September 1999. [TS99] I. Traoré and K. Stølen. Towards the Definition of a Platform supporting the Formal Development of Open Distributed Systems. Research report No. 271, April 1999. Department of Informatics, University of Oslo, Norway. 150 A. Appendix: Overview of the Case Study A Appendix: Overview of the Case Study Car talk1 switch1 Base1 Base2 alert1 give1 alert2 give2 Centre Figure 2: A Mobile Phone System We deal with a network of mobile phones (see Figure 2). A mobile phone is embedded in a car, which moves about the country. The telephone system consists of a center permanently in contact with two base stations, each covering different area of the country and handling several mobile phones at the same time. A telephone should always be in contact with a base; if it is about to go out of the area of its current base, it requests for reconnection. The current base transmits this information to the center, which is in charge of new channel allocation. As soon as the car obtain its new channels, it relinquishes contact with its current base and assumes contact with the other. The current base becomes idle and at the same time the other base is told to become active on corresponding channels. We assume that before the center transmits a disconnect order to the current base, it should receive a confirmation from the selected base. A.1 UML Specifications Figure 3 depicts the UML class diagram corresponding to UML Spec1 (in Figure 1). Class Center defines an operation for channel selection. Class Station, which implements two interfaces, each one corresponding to different configuration of a station: active base and idle base. There is also a class representing a car and another for pair of communication channels. Figure 4 depicts a refined version of the class diagram in Figure 3 and corresponds to UML Spec2. Each class in the class diagram is refined as a pair of a class and an interface. We describe the interactions among objects by means of a collaboration diagram (see figure 5). There are three kinds of objects: C, S and V, respectively a center, a 151 A.2 OUN Specification Car ChannelPair activechs: ChannelPair reconnect(p: ChannelPair) talk() 1 mobile <<interface>> Base 1 switching: Base Station activechs: set of ChannelPair 1 periph: Base 1 periph: IdleBase reqNewCh(old:ChannelPair) goToIdle(old:ChannelPair) disconnect(old:ChannelPair,new:ChannelPair) <<interface>> IdleBase goToActive(new:ChannelPair) controller 1 controller Center 1 selectChannel(old:ChannelPair):ChannelPair confirm(new:ChannelPair) Figure 3: A UML Class Diagram for the Mobile Phone System station and a vehicle. In the initial configuration, V is connected to S, which is active: V may talk repeatedly with S. When V gets rather far from S, it requests new channels. This information is retransmitted to C by S, and C selects appropriate channel and gets confirmation from the corresponding station. When V receives its new channel, it invokes reconnection. At the same time, S becomes idle. A.2 OUN Specification In the following, we provide only OUN Spec1, which is derived from UML Spec2. This specification is provided in terms of interfaces and contracts; H/ → denotes the projection of the history onto the set of all the initiation events. The signatures of operations implemented by an interface are preceded by keywords ops. We use below the notation prs to describe prefix of regular sequence. interface IChannelPair begin end interface ICar begin 152 A.2 OUN Specification <<interface>> ICar Car activeCh: ChannelPair reconnect(p: ChannelPair) 1 mobile talk() ChannelPair <<interface>> IChannelPair <<interface>> Base 1 switching: Base Station activeChs: set of ChannelPair 1 periph: Base 1 periph: IdleBase reqNewCh(old:ChannelPair) goToIdle(old:ChannelPair) disconnect(old:ChannelPair,new:ChannelPair) <<interface>> IdleBase goToActive(new:ChannelPair) controller 1 1 controller <<interface>> ICenter Center selectChannel(old:ChannelPair):ChannelPair confirm(new:ChannelPair) Figure 4: A Refined UML Class Diagram with IdleBase ops talk() ops reconnect(n : ChannelPair) end interface Center-role1 begin with Base-role1 ops selectChannel(o: ChannelPair) end interface ICenter inherits Center-role1 begin with IdleBase ops confirm(n : ChannelPair) asm (H/ →) prs [goToActive(n) confirm(n)]∗ inv (H/ →) prs [goToActive(n) confirm(n)]∗ 153 A.2 OUN Specification C: Center 2.2: goToActive(n) 3: disconnect(o,n) 2.3: confirm(n) 2.1: n = selectChannel(o) 3.2: goToIdle(o) 3.3: <<become>> S: Station [Base] S: Station [IdleBase] 3.1: reconnect(n) 2: reqNewCh(o) V: Car *1: talk() Figure 5: UML Interaction Diagram end 154 A.3 Tracking Inconsistencies interface Base-role1 begin with Center-role1 opsdisconnect(o : ChannelPair, n: ChannelPair) asm(H/ →) prs [selectChannel(o) disconnect(o,n)]∗ inv(H/ →) prs [selectChannel(o) disconnect(o,n)]∗ end interface Base inherits Base-role1 begin with ICar opsreqNewCh(o: ChannelPair) end interface IdleBase begin with Icenter ops goToActive(n: ChannelPair) inv (H/ →) prs [goToActive(n) confirm(n)]∗ end contract BaseChange (ICenter, Base, ICar) inv (H/ →) prs [reqNewCh(o)selectChannel(o) disconnect(o,n)reconnect(n)]∗ end contract Switch(ICenter, Base, IdleBase) inv b.id 6= ib.id ⇒ (H/ →) prs [selectChannel(o) goToActive(n) confirm(n)disconnect(o,n)]∗ end The invariant on interface IdleBase ensures that when the center selects a channel, it should receive a confirmation. By assuming that this requirement holds, interface ICenter will expect that a selectChannel message from a Base is followed by a disconnect message to that Base. Contracts BaseChange and Switch describe the interactions involved during station switching from different perspectives. The notation id is used in their invariants to describe object identifier. A.3 Tracking Inconsistencies In this specific example, we need to check definitions of respective invariants, which relate UML interaction diagrams with OUN interfaces and contracts. 155 A.3 Tracking Inconsistencies The definitions related to interfaces, will require to check a 00 A ⇒ I 00 kind of formula (A being an assumption and I an invariant). This is trivial for all the interfaces listed in OUN Spec1 (since there is no invariant), except for interface IdleBase, which gives rise to one obligation as follows: ` ∃Hs : trace• ([reqN ewCh(o) selectChannel(o) goT oActive(n) conf irm(n) disconnect(o, n) reconnect(n) goT oIdle(o)]in Hs ) ∧ ((Hs /IdleBase Idlebase/ →) prs[goT oActive(n) conf irm(n)]∗ ) The definition related to contracts gives rise to two obligations as follows: ` ∃Hbc : trace• ([talk()∗ reqN ewCh(o) selectChannel(o) diconnect(o, n) reconnect(n) goT oIdle(o)] in Hbc ) ∧ ((Hbc /(ICentre ∪ Base ∪ ICar)/ →) prs[reqN ewCh(o) selectChannel(o) disconnect(o, n) reconnect(n)]∗ ). ` ∃Hsw : trace• ([selectChannel(o) goT oActive(n) conf irm(n) diconnect(o, n)goT oIdle(o)] in Hsw )∧ ((Hsw /(ICentre ∪ Base ∪ IdleBase)/ →) prs[selectChannel(o) goT oActive(n) conf irm(n) disconnect(o, n)]∗ ) 156 Appendix G Enhancing Structured Review with Model-based Verification I. Traoré and D. B. Aredo Publication: I. Traoré and D. B. Aredo: Enhancing Structured Review with Model-based Verification, IEEE Transactions on Software Engineering (to appear). This article is a revised and extended version of a paper presented at a CAV’01 Workshop on Inspection in Software Engineering (WISE’01), Paris, France, July 2001. Enhancing Structured Review with Model-based Verification∗ Issa Traoré† Demissie B. Aredo‡ Abstract In this paper, we propose a development framework that extends the scope of structured review by supplementing the structured review with model-based verification. The proposed approach uses the Unified Modeling Language (UML) as a modeling notation. We discuss a set of correctness arguments that can be used in conjunction with formal verification and validation (V&V) in order to improve the quality and dependability of systems in a cost-effective way. Formal methods can be esoteric; consequently, their large scale application is hindered. We propose a framework based on integration of lightweight formal methods and structured reviews. Moreover, we show that structured reviews enable us to handle aspects of V&V that cannot be fully automated. To demonstrate the feasibility of our approach, we have conducted a study on a security-critical system - a patient document service (PDS) system. Keywords: Structured review, Formal Methods, UML, Prototype Verification System (PVS), OCL, Model-based verification, Validation & Verification. 1 Introduction The software industry is currently facing the challenge of developing systems with a high level of quality assurance at a reasonable cost and time delay. The pressure to be the first in the market has drastically compressed the development process. Software products are often delivered without the minimal quality assurance criteria, with vendors often relying on the patience and skills of customers to discover and report bugs. Though lower costs and rapid delivery seem to be the main issues in the ∗ An earlier and shorter version appeared in the Proc. of the Workshop on Inspection in Software Engineering (WISE’01), Paris, France, July 2001. † Issa Traoré is with the Department of Electrical and Computer Engineering, University of Victoria, Canada. E-mail: itraore@ece.uvic.ca ‡ Demissie B. Aredo is with the Norwegian Computing Center, N-314 Oslo, Norway. E-mail: aredo@nr.no 157 1. Introduction contemporary marketing environment, meeting some level of quality assurance is still an important concern in the highly competitive market. Software quality may significantly improve by integrating formal verification and validation (V&V) into the development process. V&V is the whole range of software analysis processes that encompass requirement, design, program code reviews, and testing. According to studies in the literature [38, 18, 30], structured review is an effective and cheap error detection technique. Conventional review approaches use ad hoc or checklist-based reading (CBR) techniques [14, 18]. Ad hoc techniques do not specify any explicit method for finding defects, but rather rely solely on reviewers’ intuitions and experiences. A CBR technique provides some guidance in the form of questionnaires based on past experiences in detecting defects and on specific rules. The number of questions in a CBR, however, tends to be overwhelming. Moreover, there is no concrete guidance concerning how questions should be answered. An alternative approach, in which reviewers play more proactive roles, is the Active Design Review (ADR) technique proposed by Parnas et al. [37]. The level of quality assurance achieved, however, with structured review techniques may not be sufficient for critical systems, where a failure may result in significant economic losses, physical damage, or threat to human life. Structured reviews are effective in checking correctness arguments such as completeness, robustness, and optimality of a design decision. Checking of arguments such as optimality are usually based on intuition and experience, as they can only be partially inspected using systematic and automated approaches, e.g. code smell detectors [46]. On the other hand, arguments such as traceability can be checked by following a restricted set of guidelines and rules. When the number of guidelines, however, becomes significantly large, manual review is not feasible: reviewers can be overwhelmed and forget or mismatch some of the rules. Structured review is not efficient in checking model validity, which is usually checked by analyzing semantics of the model against requirements in order to discover inconsistencies. These issues are addressed by formal analysis, where models are given precise semantics, and tools are used to check various scenarios mechanically. Although system reliability can be improved by using formal analysis techniques, the esoteric nature of formal methods, and the need for intensive user interaction with the verification environment, impose significant barriers on their application to large scale systems. To address this, strategies for integrating formal methods into the software development process have been proposed to exploit the synergy between formal and semi-formal methods [20, 11, 45, 29, 33]. In the sequel, we propose an approach that enhances structured review with formal V&V techniques by extending the scope of correctness arguments that can be checked by structured review. We chose the ADR approach as a basis of the extension, since both ADR and formal techniques require reviewers to play a proactive role during the 158 2. Concepts of Structured Reviews review process. For the model-based verification, we use an integration of the Unified Modeling Language (UML) [35] and the Prototype Verification System (PVS) [36]. We propose formal semantics for UML notation using the specification language of PVS. Based on the semantic definition we developed a CASE tool known as Precise UML Development Environment (PrUDE) [42]. The PrUDE platform integrates the graphical UML notation as a front-end to the PVS verification tools. To minimize the difficulties related to interactive proof checking, we define proof strategies to automate proof checking based on semantic definitions for UML notations. For complex proof obligations that cannot be automated, we suggest that the designer records informal correctness arguments to be challenged during a review process. The rest of the paper is organized as follows. In Section 2, we discuss concepts of structured review, such as review arguments, review process, and units of review. In Section 3, we report on a feasibility study of our approach based on the requirements and models of a security-critical application. In Section 4, we present a model-based verification approach that supplements our structured review framework. In Section 5, we discuss how the proposed framework can be used in test model review. In Section 6, we discuss related works. Finally, in Section 7, we draw some conclusions and discuss research issues for future work. 2 Concepts of Structured Reviews 2.1 Review Arguments It is important to relate implementation or design elements to requirements. Generating such relationships exposes crucial errors, misconceptions, and omissions. We advocate the use of informal correctness arguments in order to bridge the gap between specification, design and implementation. Our approach draws on the work of Britcher [5], where key program attributes, such as topology, algebra, invariance, and robustness are defined for procedural programs. Correctness arguments are presented as a series of questionnaires that should be answered by the reviewers. The formulation of the questionnaires follows the Active Design Review (ADR) approach [37]. We consider the following six correctness arguments to encompass and extend the criteria defined in [5]: validity, traceability, optimality, robustness, well-formedness and consistency. Though some of these arguments are overlapping, they provide a good coverage of the most important concerns raised with respect to correctness of a design model. Validity is concerned with the conformance of a specification to customer requirements. In order to check validity of a model, the reviewer draws some conjectures from the requirements and checks the conjectures against the model. The questions that should be answered for this argument include the following: 1. Do the exhibits provide complete coverage of the business rules, properties and 159 2.2 Review Process invariants characterizing the system? 2. Are the exhibits consistent with the requirements? Traceability relates requirement and design specifications. Questions that should be answered by reviewers are intended to achieve structural and behavioral conformances between corresponding abstract and refined specifications. Questions that should be answered for this argument include the following: 1. Which aspects of the model have changed, and which ones remain unchanged by the refinement? 2. Are the relationships between abstract and concrete elements adequate and consistent? Optimality deals with appropriateness and efficiency of design decisions. Optimality of a design can be analyzed by answering questions such as the following: 1. Are the representations chosen during the refinement step efficient with respect to the requirements? 2. Are there other alternatives that are better solutions? Robustness deals with the handling of abnormal or exceptional situations. Questions that are asked during the review should focus on detecting omissions and gaps in the design. The following are some of the questions that could be raised for the robustness argument: 1. What are the normal conditions under which the system operates? 2. What are the exceptional and abnormal conditions related to the system operation? Are they handled correctly? Well-formedness is mainly concerned with a correct use of notations to describe design models. A model is said to be well-formed if all syntactic rules underlying the notation are enforced. Consistency is the broadest of all correctness arguments defined so far. Some of the above arguments may fall under the consistency category. Most inconsistencies in UML models can be captured by UML CASE tools; however, a few of the inconsistencies may not be caught. 2.2 2.2.1 Review Process Development process and units of review The UML standard document [35] defines modeling notations without any guidance concerning their use. We use a development process that is based on the Rational 160 2.2 Review Process Unified Process (RUP) [24], which is used in conjunction with UML in many software development organizations. RUP is an iterative and incremental development process aimed at mitigating risks [24]. The process begins by identifying use cases from the customer requirements. The use cases are analyzed iteratively by focusing primarily on the most critical use cases. A critical use case is a use case that contains a significant risk for the system, or that covers quality requirements such as performance, availability, and security. In conventional review, requirements and design specifications and program code are used as units of review. According to Laitenberger et al. [27], document-centric approaches are appropriate for procedural systems, but they fail to meet the challenges raised by object-oriented systems for which there is no clear cut boundary between different artifacts involved in the software life cycle. For UML models, an architecturecentric approach with a component as a unit of review is suggested. In contrast to Laitenberger et al., we combine the architecture-centric and documentcentric approaches. We use key building block of software architecture, namely use cases, as a unit of review. Within the use case, we organize the review around different documents such as requirements, analysis, design specification and testing, as described in the next section. 2.2.2 Major phases of the review process The review process shown in Figure 1 consists of four major phases: user requirements review, analysis models review, design models review, and test data review. The requirements review is based on use case model and hence, all use cases are considered at this stage. The three subsequent steps are repeated iteratively for every use case, as use case is the unit of review. Use cases are integrated progressively after every iteration by analyzing the possible inconsistencies that may arise from overlaps. For instance, there is a many-to-many relationship between use cases and objects or components that implement their functionality. This may result in inconsistencies in the representation of the objects across relevant use cases. During the integration, the reviewer manually checks that each object is represented consistently across the use cases where it appears. Review of user requirements: In UML, user requirements are described typically by use case models. Review activities in this phase consist of checking completeness and consistency arguments. Completeness refers to checking whether or not a useful piece of information is missing from the use case model. More specifically, the reviewer must ensure that all functional and quality requirements of the system are covered by at least one use case. For every use case, the reviewer must check that every identified scenario is captured by a flow of events. The reviewer also manually checks consistency of use case descriptions with the original requirements. Review of analysis models: The arguments that are checked in this phase are consistency, well-formedness and validity. The review starts by checking intra- and 161 2.2 Review Process Use Case Model (User Requirements) 1. User Requirements Review −Coverage (manual) −Consistency (manual) Design Model Revised Use Case Model Analysis Model 2. Analysis Model Review − Consistency (manual) − Well−formedness (PrUDE) − Validity (PrUDE) Revised Analysis Model 3. Design Model Review − Consistency (manual) − Optimatility (manual) − Robustness (manual) − Traceability (manual) − Well−formedness (PrUDE) − Validity (PrUDE) Test Data Revised Design Model 4. Test Data Review − Coverage (manual) − Correctness (PRUDE) Revised Test Data Program Testing Figure 1: Major Steps in the Review Process inter-model consistency of UML analysis models and consistency of business rules. Reviewing the models manually to identify any contradiction with the user requirements ensures consistency of the business rules. Intra-model consistency rules for UML diagrams at the syntax level are checked automatically based on the set of well-formedness rules that are given in the UML standard [35] and implemented in the PrUDE tool. Inter-model consistency of UML diagrams are checked manually based on guidelines provided. For instance, guidelines for checking consistency between a sequence diagram and a class diagram associated with a use case include the following: 1. Ensure that the class of an object in the sequence diagram is represented consistently in the class diagram. 2. Ensure that every message received by an object in the sequence diagram is defined consistently as part of the class of the object in the class diagram. After consistency of the analysis model is checked and discovered defects are fixed, the revised model is imported into the PrUDE tool, where well-formedness and validity arguments are automatically checked. Review of design models: A design model is obtained from analysis model by successive refinement steps. Design traceability is documented by describing changes 162 2.2 Review Process made to the analysis model in order to obtain the design model. Design traceability documentation is produced by a designer and challenged by a reviewer. Review of a design model is performed manually and consists of checking consistency, traceability, robustness, and optimality arguments. Review of test data: The artifacts submitted to the reviewer consist of test cases generated from the model and expressions used to generate them. The role of a reviewer is to check correctness of the expressions by checking their accuracy in representing the system specification. The reviewer must check that the coverage criteria for specification-based testing strategies used to generate the test cases are met. The revised test cases are then sent back to the tester, who uses them in testing the program. Table 1 summarizes review activities that can be performed in the review Activities Consistency, completeness of Use cases Well-formedness Consistency of business rules Consistency across diagrams Validity - Semantic generation - Business rules translation - Type checking - Model checking - Proof checking - Error trace back Traceability Optimality Robustness Test case generation Test data review (coverage, correctness) Test execution Automation Manual Automatic Manual Semi-Automatic (*) Automatic Manual (*) Automatic Automatic Semi-automatic (automatic for simple proof obligations) Manual (*) Manual (*) Manual Manual Semi-automatic Manual Automatic Table 1: Summary of Review Activities (*) indicates activities to be automated in future work process. Most of the steps in the process can be automated, whereas some complex aspects, such as the refinement and correctness-checking activities, cannot be fully automated and hence, rely on human guidance and ingenuity. We argue that these aspects are reviewed using informal arguments. For instance, for a given correctness argument that cannot be checked automatically, a reviewer may provide and record informal arguments that are challenged using a carefully designed review procedure. 163 3. Feasibility Study based on a Patient Document Service (PDS) 3 Feasibility Study based on a Patient Document Service (PDS) In order to demonstrate feasibility of our approach, we performed a study based on a critical system that provides a secure patient document service (PDS). In this section, we describe the setup of the study, the results obtained, and present some examples of review activities and defects discovered by the reviewers involved in the study. 3.1 Setup and Results Achieved The study involved seven students participating in directed studies at the graduate level. All of them have strong background in UML and OCL, and some of them have several years of industrial experience either as a programmer or a tester. Three of them were assigned the role of reviewer. The four remaining were assigned the following roles: requirements and design specifications; implementation; test case generation; and translation of OCL expressions into PVS (this role was assigned to the student who has a strong background in PVS and OCL; others have little exposure to formal methods). The objective of the study was to evaluate feasibility of our approach by measuring the proportion of defects detected during the review, and assess its cost effectiveness by measuring the effort required to detect them. We did not inject any defect into the models; instead, we reviewed every new document before every review meeting to explore the number and kinds of errors known before the review. Before starting the review process, the review team attended a short tutorial on PVS and the PrUDE tool and a briefing on the review technique. The use case model consists of eight use cases; the most critical use case was selected for the study. The analysis model consists of business rules, six sequence diagrams, a class diagram, and a statechart diagram. The design model consists of six sequence diagrams, a statechart diagram, a class diagram and a collaboration diagram describing the subsystems and their links, and a design traceability document. We used a restricted test set consisting of fifteen expressions and twenty test cases. Table 2 summarizes the quantitative results of the study. The size of the study material and the number of participants do not allow us to draw statistically significant conclusions based on quantitative data. Yet, the obtained results and the kinds of defects discovered are promising and consistent with the theoretical expectations. Hence, we discuss the results of the study qualitatively, rather than quantitatively. We noticed that the efficiency and cost effectiveness of defect finding vary significantly based on several factors: the kinds of defects; whether they are detected manually or automatically; whether the detection method follows precise rules, or is based on previous experiences and intuition, or a combination of both; background of reviewers; and the size and 164 3.2 Summary of user requirements Table 2: Quantitative Results of the Feasibility Study Categories of Defects 1 2 3 4 5 Number of Defects in Initial Document 30 5 10 15 8+2 Average Detection Time per Defect 30s 5min 30min < 1min < 1min Detection Rates 100% 100% 50% 100% 100% complexity of the requirements. Based on the cost and ease of detection, we identify five categories of defects: 1. Defects discovered manually using precise and systematic guidelines, e.g. interconsistency between UML diagrams and test coverage analysis. All the defects belonging to this category were easily and rapidly discovered by the reviewers. 2. Manually detected defects that require some logical thinking and for which no clear guidelines were given, e.g. consistency of business rules. These defects were all discovered, but they required more time than the latter. 3. Manually detected defects requiring some intuition and experience, and for which no strict guidelines were provided. Detecting defects belonging to this category took more time, and only half of them were detected. 4. Defects discovered automatically using the PrUDE tool, e.g. well-formedness defects. All defects in this category were detected easily and very quickly. 5. Defects related to validity that were discovered using the PrUDE tool but required some prior intuitive work by the reviewers to define appropriate conjectures. Identifying conjectures, and discharging them after they are translated into PVS, was straightforward. Narrowing the scope of the conjectures and focussing only on the relevant ones, however, was difficult. The result was also varied depending on the competence of the reviewers. Prior to the review, we identified eight conjectures worth checking. One of the reviewers identified two additional interesting conjectures. Each of these conjectures was checked using PVS proof strategies implemented in the PrUDE tool in less than a minute. 3.2 Summary of user requirements The main functionality of the PDS system is to provide secured access to patient medical records by authorized users. Actors involved in this system are patients, relatives 165 3.3 UML Models and Business Rules and friends of patients, doctors, and system administrators. The main information to be secured is medical records of patients. A patient may choose a family doctor who is automatically granted the right to read and modify medical records of the patient. Only authorized doctors can read or modify a medical record, and every doctor is solely accountable for the modification (s)he is making to the medical record database. The system is expected to enforce this accountability. An authorized doctor is a registered doctor that a patient has chosen either as his family doctor or as guest doctor, e.g. due to unavailability of the family doctor. A patient is the only person that is allowed to choose his own doctor. A patient may have read access to his own medical record, but (s)he cannot modify it. He may grant read access to his friends and family members. The site administrator is the only person who can create, delete, read and modify a patient record. The system is required to provide security properties, i.e. integrity, confidentiality, and availability. 3.3 UML Models and Business Rules To illustrate feasibility of our approach, a security critical use case, namely the Login use case, is considered. Some selected artifacts of the analysis model for the Login use case are discussed below. The sequence diagram shown in Figure 2 describes a new dp : DocProvider p : Person register() reg_Ok() login() [NoAccept]login_Nok() [accept]login_Ok() create() s : Session sendRequest() recvResult() logout() Figure 2: A Sequence Diagram for a New User Login Scenario user login scenario. A new user needs to register with the document server DocProvider before being able to login and access medical records. If the login is successful, a session 166 3.3 UML Models and Business Rules object carrying the user data is created and will perform operations on behalf of the user during the login session. The session object is automatically destroyed when the user logs out. The class diagram shown in Figure 3 describes a view of classes of objects participating in the Login use case. Users of the system are specified by classes Patient, Doctor, Administrator and Friend defined as subclasses of class Person that specifies a set of common attributes. The class DocProvider manages access to medical records described by the class MedicalRecord. The SecurityProfile of a user is defined as a set of instances of AccessRight associated to the class Person. Figure 4 shows a statechart * myFriend Patient Friend Administrator myDoctor 1 owner * Doctor Person − − − − − − 1 MedicalRecord name: string password: string userid: string address: string age: nat ssn: nat reg_OK() recvResult() login_OK() login_Nok() *{set}records access DocProvider SecurityProfile − owner: Person *{set}securityDirectory *{set}right AccessRight − − − − − − − − *{set}users read: boolean modify: boolean delete: boolean create: boolean addDoc: boolean removeDoc: boolean addFriend: boolean removeFriend: boolean − mode: boolean − connection: boolean − service: boolean − securityStatus: boolean +register() +login(uid:string,pwd:string) +sendRequest(req:Request) +recvResult(res:Result) +close() +abnormalClose() +detectViolation() +analyzeViolation() +backToNormal() *{set}sessions Session − owner: Person sendRequest() logout() Figure 3: A UML Class Diagram for the PDS System diagram describing dynamic behavior of the DocProvider class. The state machine starts in the initial state Idle where security parameters are initialized. Then, it moves to a basic operating state NormalOperation, and waits for requests from users. When a request is received, the security profile of the user is checked and the request is either served or rejected. NormalOperation is a concurrent state in which requests for 167 3.3 UML Models and Business Rules DocumentServerState register() Init NormalOperation Connecting [!recoverable] logout(session)/ clearSession Idle login(uid,pwd) [!accept] login(uid,pwd)[accept] /createSession AbnormalOperation SecurityViolation Connected Waiting request(req) [reqOK] logout(session)/ clearSession request(req) [!reqOK] [recoverable] Recovery execute(req) detectViolation() Processing Servicing backToNormal() Figure 4: A UML Statechart Diagram for the DocProvider class connection and other requests can be processed simultaneously. Business Rules: UML diagrams are augmented by a set of business rules that are specified using the Object Constraint Language (OCL) [47]. In the PrUDE framework, we consider two sets of OCL expressions: 1. Set of expressions specifying the constraints that must be enforced by an object or a group of related objects, or operations. 2. Set of expressions provided by specifiers to make UML garphical constructs more meaningful by complementing underlying semantics. For instance, for the statechart diagram shown in Figure 4, the specifier should define what the state Idle or the action createSession means. Let us look at some examples of business rules: Rule 1: A patient cannot create, delete or modify his own medical records. context Patient inv self.profile.right → forAll(r |not(r.create or r.modify or r.delete)) Rule 2: A doctor cannot create or delete a medical record. context Patient inv self.myDoctor.profile.right → forAll(r | not (r.create or r.delete)) 168 3.4 Examples of Review Activities Complementary semantics are provided for graphical constructs in the form of predicates. For instance, consider the transition login() from state Idle to state Connected in Figure 4. To describe the transition, we define predicates for the states Idle and Connected, the guard condition accept, and the action createSession. The predicate predConnected states that the state Connected is active when DocProvider is in its normal operating mode, has established a connection, and has at least one active user. context DocProvider predIdle() : Boolean self.mode = true and self.connection = false predConnected() : Boolean self.mode = true and self.connection = true and self.users→notEmpty The predicate predAccept corresponds to the guard condition accept, and ensures that for a login to be successful, there must exist a security profile in the security database that matches the profile of the requesting user. Predicate predCreateSession corresponds to a postcondition related to the action createSession and states that after a successful execution of the login() method, the cardinality of the set of active sessions is increased by one. context DocProvider::login() predAccept(uid: string, pwd: string) : Boolean self.securityDirectory → exists(sp | sp.owner.userid=uid ∧ sp.owner.password=pwd) predCreateSession() : Boolean self.sessions → size = old self.sessions → size + 1 3.4 Examples of Review Activities We illustrate some of the main steps of the review process by presenting examples of defects discovered during the feasibility study. 3.4.1 User requirements Review of user requirements involves checking consistency and completeness. The Login use case is described by two flows of events: a flow of events describing login scenario for an existing member, and a flow of events describing a login attempt by a new member. During the review, it was discovered that an additional flow of events must be considered to have complete coverage of all the scenarios. Four variants of the primary flow of events must be considered: Administrator login, Doctor Login, Patient Login, and Friend Login. Several inconsistencies in the user requirements were discovered during the review process. For instance, the requirement stating that “only authorized doctors can read or modify a medical record” was found to be inconsistent with the requirement stating that “the site administrator is the only person who can create, delete, read and 169 3.4 Examples of Review Activities modify a patient record.” This led to the following revised requirement: ”only the site administrator and authorized doctors can read or modify a record” and ”only the site administrator can create and delete a record.” 3.4.2 Analysis models As previously mentioned, a review of the analysis model starts by checking consistency of the model: intra- and inter-UML diagram consistency, and consistency of business rules. Internal consistency of UML diagrams, at the syntactic level, is covered by the well-formedness rules, that can be checked automatically by using the PrUDE tool. Consistency across diagrams partly depends on the development process adopted. The reviewer manually checks consistency by following the guidelines provided (see section 2.2.2). For example, we quote the following from reviewer’s report on consistencies between class and sequence diagrams, and class and statechart diagrams: 1. The operations sendRequest(req:Request) and recvResult(res:Result) in class DocProvider may not be necessary in the class diagram. They are not called in the Login use case. Rather, the sendRequest() method of the class Session and the recvResult() of the class Person class are used. 2. The operation logout() of the class DocProvider is missing from the class diagram. Consistency of business rules is checked manually by reviewers. For instance, one of the reviewers established that the analysis model fails to consistently describe user requirements stating that a patient must not be able to modify his own record. A patient can be a doctor by profession, in which case, he can choose himself as a ”guest” or family doctor. Consequently, he grants himself the right to modify his own record, as the above system design does not prevent this. Hence, addition of the following business rule. Rule 3: A patient can choose a registered doctor, except himself, as a family or a ”guest” doctor. context Person inv (self.asType(Patient) ∧ self.asType(Doctor)) ⇒ (self.myDoctor → excludes(self)) 3.4.3 Design models Successive refinements of an analysis model result in a design model. A design model of the PDS system consists of six sequence diagrams, a statechart diagram, and a class diagram. Design traceability documentation was also provided. Due to space limitation, we discuss only the design class diagram shown in Figure 5. Review of 170 3.4 Examples of Review Activities UserManager − − − − − − − name: string password: string userid: string address: string age: nat ssn: nat role: {Patient, Doctor, Friend,Administrator} DirectoryService directory *users MedicalRecord *{vector}records access *{seq}right SecurityProfile SecurityManager − owner: Person *{}securityDirectory *{seq}right AccessRight − − − − − − − − read: boolean modify: boolean delete: boolean create: boolean addDoc: boolean removeDoc: boolean addFriend: boolean removeFriend: boolean − mode: boolean − connection: boolean − service: boolean − securityStatus: boolean +register() +init() +login(uid:string,pwd:string) +service(req:Request, res:Result) + monitor() +close() *{vector}sessions Session − owner: Person sendRequest() logout() Figure 5: Design Diagram of the Patient Document Service the design model primarily involves checking consistency, robustness, optimality and traceability arguments manually. To check the traceability argument, the reviewer examines the relationships between the structural and behavioral elements defined in the specification and the design documents. For instance, let us consider the design class diagram shown in Figure 5. It is a refinement of the analysis class diagram shown in Figure 3. Instead of having several classes for different users of the system, e.g. Person, Patient, etc., there is only one user class, namely the class UserManager. The UserManager class specifies the same set of attributes as the Person class, in addition to the role attribute that corresponds to the specific role played by the user. The class SecurityManager is a new class that performs necessary security checks before processing a request. There is also a standard directory service represented by the class DirectoryService. Since the configuration of the model has changed significantly, it is necessary to ensure design traceability by showing all information mentioned in the abstract model can be found in the design model. For instance, the designer considers that there is a direct correspondence between class DocProvider in the abstract model and class SecurityManager in the design model. A similar correspondence exists between Patient, Doctor, Friend, Administrator and User. The correspondence is documented by providing retrieve functions that relate abstract and concrete representations. We use the following notation for the retrieve function: retr : [Rep → Abs], where Abs is the abstraction and Rep is a representation. For instance, for the class SecurityManager, the following retrieve function is defined: 171 3.4 Examples of Review Activities retr: [SecurityManager → DocProvider] context DocProvider sm: SecurityManager inv self = retr(sm) ⇒ (self.records = retr(sm.records) ∧ self.securityDirectory = retr(sm.securityDirectory) ∧ self.users = retr(sm.users) ∧ self.sessions = retr(sm.sessions) ∧ self.mode = retr(sm.mode) ∧ self.connection = retr(sm.connection) ∧ self.service = retr(sm.service) ∧ self.securityStatus = retr(sm.securityStatus)) A retrieve function on a class is defined in terms of retrieve functions on its attributes. A retrieve function can be as simple as the identity function, or more complex, depending on data types involved. For instance, the above retrieve function establishes correspondence between the records attributes in the classes DocProvider and SecurityManager. However, their data types are different (see the respective class diagrams). The abstract records attribute is defined as a set of MedicalRecord, whereas the refined attribute is defined as a vector of MedicalRecord, e.g. an array. In this case, the retrieve function for the attribute records is defined as follows: retr(sm.records) = {sm.records[i]| 0 ≤ i <sm.records.size} In order to establish correctness of the representation, an adequacy proof obligation is stated and discharged by the designer. The adequacy proof obligation is provided in the design traceability documentation. The role of the reviewer is to review the supplied proof. The following proof obligation states that the retrieve function must be total: context DocProvider inv self→ forAll(dp|(SecurityManager → exists(sm | retr(sm.records) = dp.records))) The proof obligation is discharged by providing the following informal constructive argument: Given a finite set, it is always possible to arrange the elements of the set into an array. The set represents the collection of elements associated to the array. Jones [23] encourages the use of informal constructive arguments to discharge simple proof obligations. Alternatively, the PVS prover can be used to discharge the proof obligations. However, to make this option more attractive to reviewers, we need to identify and rigorously define systematic mechanisms characterizing the UML refinement process that can be used to define and implement efficient proof strategies. This will be dealt with in future work. Although the data representation chosen by the designer seems adequate, the reviewer may raise some concerns about its optimality. From the requirements, it appears 172 4. A Framework for Model-based Verification that the attribute records where all medical records are stored should allow efficient searching. The question is, would representing the records as a binary tree be more efficient than using a vector? An optimality issue raised explicitly by one of the reviewer is quoted as follows: Method create() is assumed missing in both SecurityProfile and Session classes. This may not be the case if create() is meant to be interpreted as instantiation through a constructor call. Unless the designer assumed that it was intended as a factory method. Some reviewers have raised a robustness issue: the patient is the only person allowed to choose his doctor. Consider the following: a patient has travelled abroad and suffers a serious accident. The authorized doctors listed in his record cannot reach him, and the patient is not in a condition to choose a local ‘guest’ doctor. 4 A Framework for Model-based Verification The verification scope of most of the conventional review techniques, with the exception of the cleanroom approach, which involves some formal aspects, is limited to a few arguments such as correctness, consistency, and completeness. None of them efficiently address the validity argument. Validity can be checked by using formal reasoning. The PrUDE platform is suitable for this purpose as it makes formal analysis more attractive to practitioners who are reluctant to delve into the mathematical details of formal verification. In this section, we present a framework for model-based verification and illustrate through examples how it can be used to address arguments such as validity. 4.1 Formalization of UML Notations in PVS We begin by giving a brief overview of the PVS environment and formal semantic definitions for UML notations. Because of space restriction, we present only an overview of semantic definitions for UML statecharts. Interested readers are referred to [41, 2] for more details. 4.1.1 The Prototype Verification System The prototype verification system (PVS) [36] is a formalism consisting of a highly expressive specification language tightly integrated with a type-checker, a theorem-prover, and a model-checker. The PVS specification language (PVS-SL) is based on typed classical higher-order logic. Its type system contains basic types such as boolean, integer, real and type constructors for the set, tuple, record, and function types. A record type is a finite set of fields of general signature R: TYPE = [# a1 : T1 , . . . , an : Tn #], where ai ’s are accessor functions and Ti ’s are type expressions. 173 4.1 Formalization of UML Notations in PVS The declaration F: TYPE = [D1 , D2 , . . . , Dn → R] models types of functions with domain D = D1 × D2 × · · · × Dn and range R where Di ’s and R are type expressions. Given a type T, the type of sets of elements of T is specified using one of the constructs pred[T] or setof[T], each of which is a shorthand for [T→bool]. The PVS type system has been augmented by predicate subtyping and dependent typing. Although subtyping makes type-checking more powerful by allowing stronger checks for consistency and invariance in a uniform manner, it renders type checking undecidable and results in generation of proof obligations, called Type Correctness Conditions (TCCs). A great deal of TCCs can be discharged automatically using the theorem prover, whereas the more involved ones may require user interactions. PVS specifications are organized as a collection of theories representing specification modules. A theory may contain specification of types, constants, axioms and theorems. PVS supports modularity and reuse by means of parameterized theories making it possible to describe generic modeling elements. Our formal semantics consist of a set of theories corresponding to generic semantic definitions and theories corresponding to application-specific definitions. The generic theories are included in the PVS library, called preludes, and can be imported by the application-specific theories. The latter are automatically generated for the application under design. 4.1.2 Formalization approach A great deal of work has been done on providing the mathematical basis for the concepts underlying OO modeling techniques using different approaches. In general, three major approaches can be identified [17]: supplemental, OO-extension, and method integration. In the supplemental approach, semi-formal OO modeling constructs are replaced by more formal constructs, whereas in the OO-extension approach, a novel or an existing formal notation is extended with OO features, thus making it more compatible with OO modeling. These approaches have major limitations: they are not user friendly; developers have to deal with a considerable amount of formal artifacts - a significant barrier to large-scale application of formal methods in the industrial setting. The OOextension results in a rich body of formal notation, yet it introduces more complex semantics and suffers from lack of supporting CASE tools [13]. Method integration is a more workable approach that integrates semi-formal notations with suitable formalism(s), thereby making them more precise and amenable to rigorous analysis. It allows developers to manipulate the graphical models they have created without having in-depth knowledge about the underlying formal artifact that is processed at the back-end. Based on the method integration approach, we proposed semantics for a subset of UML notations [41, 2] using the PVS specification language [36] as the underlying semantic foundation. The informal semantic definitions provided in the UML standard document [35] are used as the basis of the formal semantics. Our work has focused on 174 4.1 Formalization of UML Notations in PVS semantics of UML structural and behavioral models, namely the class, statechart, and interaction diagrams. These diagrams have been chosen because they provide a good coverage of system properties (structural and behavioral). Our approach can easily be extended to other UML constructs. This is among the issues to be investigated in our future work. 4.1.3 Semantics of UML statecharts The steps towards the formalization of semantics of UML statecharts consist of defining a set of elementary predicates that describe relevant properties of system states or system operations. The set of elementary predicates is then partitioned into elementary states and events. A state describes a condition of the system that has a non-zero duration. A clear distinction shall be made between a concrete state of the system and an abstract notion of state in statechart diagrams. We represent a concrete state by a record type V, whose fields correspond to state variables x1 . . . xn of type T1 . . . Tn , respectively, where T1 . . . Tn are type expressions. For the sake of simplicity, we define Ti ’s as uninterpreted types in PVS. T1 , T2 , . . . , Tn : TYPE V : TYPE = [# x1 : T1 , x2 : T2 , . . . , xn : Tn #] A transition is defined by a source state, a target state, a trigger event, a guard condition and an action. We represent in PVS the notions of event, state vertex, guard condition, and action as uninterpreted types. We represent transitions by defining a PVS record type Transition. Event, Vertex, Condition, Action: TYPE+ Transition: TYPE+ = [# source: Vertex, trigger: Event, guard: Condition, effect: Action, target: Vertex #] We define three categories of predicates associated with, respectively, the notions of state vertex, guard condition, and action. The predicate associated with a state vertex corresponds to the condition that must hold for the state to be active. The predicate associated with an action corresponds to a condition that must hold after the execution of the action, and it can be assumed to be the postcondition of the action. The state and guard conditions are functions of the current value of the state variables, whereas the action postcondition is a function of both the current and the future values of the state variables. The record type VC given below, combines both the current and next state information. VC : pred pred pred TYPE = [# current : V, next : : [Vertex → pred[V]]; : [Condition → pred[V]]; : [Action → pred[VC]] V#]; 175 4.1 Formalization of UML Notations in PVS A transition is enabled if the event instance generated matches its trigger, its guard condition is fulfilled, and its source state is active. An enabled transition is eligible for firing. Firing a transition activates its target state and executes its action. The predicates enabled and fired describe, respectively, conditions for enabling and firing of a transition. tr: VAR Transition; v, v1: VAR V; vc: VAR VC; e: VAR Event enabled(e, tr, v): bool = pred(source(tr))(v) AND (trigger(tr) = e) AND pred(guard(tr))(v) fired(tr,v,v1): 4.1.4 bool = pred(target(tr))(v1) AND pred(effect(tr))(vc) WHERE vc = (# current:=v, next:=v1#) PVS proof strategies The ultimate goal of formalizing UML notations is to precisely specify and rigorously verify important system properties. Using primitive proof rules of the the PVS prover requires some expertise, and it can be quite tedious. Fortunately, PVS provides a mechanism for defining more powerful proof strategies, significantly improving proof automation. This allows checking of complex proofs in a single atomic step by hiding the tedious intermediary steps from the user. A PVS proof strategy is defined using the following template, (defstep name (required-parameters & optional optional-parameters) strategy-expression documentation-string) where defstep is the keyword to define a strategy. The strategy itself is specified by providing a name, a proof expression, and a documentation string. We have identified and implemented some powerful proof strategies that allow full automation of checking system properties based on our semantic models [31]. These strategies are implemented in the PrUDE tool and executed in a batch mode. For instance, for properties based on statechart diagrams, the following proof strategy is proposed: (defstep statechart-proof-strategy (then (auto-rewrite "user defined assumption1" "user defined asumption2"...) (skosimp) (expand "ConfigurationPair ") (grind) ) ) The predicates defined as complementary semantics of a statechart diagram represent assumptions on the system behavior defined by the specifier. These assumptions, stated as axioms, are collected and installed in the proof system as auto-rewrite rules using auto-rewrite command, so that the PVS theorem-prover is able to search for these axioms automatically. The skosimp command replaces universal quantifications in the target formula with constants. The expand command expands a generic semantic function called ConfigurationPair that defines an abstraction of the current and next 176 4.2 The PrUDE Platform state configurations of the system. The grind command is a catch-all strategy that is frequently used to complete a proof branch or to apply all obvious simplifications until they no longer apply. First, it installs the rewrite rules along with all relevant definitions in the given sub-goal, and then carries out all the equality replacements in addition to other things. 4.2 The PrUDE Platform The Precise UML Development Environment (PrUDE) tool [42] has been developed to automate the model-based verification framework presented above. In the sequel, we discuss the main features of the PrUDE platform, namely, its foundation, automation, and V&V strategies. Independent of the feasibility study presented in Section 3.1, the PrUDE tool was applied to three case studies: a banking system [43], a temperature regulator software component [31], and a network reconfiguration protocol [44]. 4.2.1 Notations and tools involved in PrUDE The core notation used in the PrUDE platform is the UML [35]. UML provides an underlying methodology for specification and refinement, a graphical notation which contributes to communicability and friendliness, and most importantly, UML is an international standard for object-oriented modeling. UML, however, is severely limited by the fact that its graphical constructs are not enough to achieve a complete and precise specification of a system. This is generally addressed by using the Object Constraint Language (OCL) [47] to specify additional constraints on objects in the model, such as invariants on classes and types, abstract definitions of operations and attributes, non-functional requirements, etc. However, the semantic of OCL is not mathematically defined, and hence, it does not provide the facilities required for rigorous analysis; at most, there is a set of type conformance rules. In order to achieve such objectives, we use PVS as a semantic foundation for our platform. PVS provides a rich semantic foundation and a collection of formal verification tools. A particular strength of PVS is its capacity to exploit the synergy between all these tools. The PrUDE platform is automated by a tool suite consisting of a UML CASE tool integrated with V&V environment that supports type-checking, model-checking, proof-checking, testing and well-formedness checking [42]. Model-checking and proofchecking are based on the PVS toolkit. The interface of PrUDE to a UML tool is based on XMI, as it provides an explicit model exchange format for UML based tools. Since any UML CASE tool is expected to export models in the XMI format, the PrUDE platform is independent of any UML tool vendor. This makes it possible to easily adapt the PrUDE tool to an existing software development environment. 177 4.2 The PrUDE Platform OCL business rules UML Spec Semantic conversion OCL2PVS translation PVS model Error − Type−checking − Well−formedness−checking Validation/Verification − Model−checking − Proof−checking Valid UML model Code generation Test case generation Test cases Program − Test execution/ − Test coverage analysis Figure 6: V&V Strategy Underlying the PrUDE Platform 4.2.2 V&V strategy underlying the PrUDE platform The V&V strategy shown in Figure 6 is followed in the PrUDE platform. A designer develops a model using a UML CASE-tool and submits the model to the PrUDE tool, which automatically generates formal semantic models in the PVS-SL. Usually, a UML specification is accompanied by rules, e.g. invariants, pre- and post-conditions, and system properties specified in OCL expressions that are manually translated into PVS and integrated with the semantic models. Business rules can be inserted directly using a property editor. Next, well-formedness and consistency of the resulting model is checked based on the rules defined in the abstract syntax of UML constructs [35]. In the next step, the model is checked against the business rules by invoking the PVS toolkit. Business rules expressed as PVS conjectures, and theorems are analyzed using model-checking or proof-checking. If an error is discovered, the reviewer goes back to the OCL business rules or UML models and fixes the error. The above process is iterated until a valid UML model is obtained. Using the valid UML model, the designer refines the model through subsequent steps and implements the system. The program code can be tested with the PrUDE tool using the UML specification. The UML model obtained after a series of V&V steps is used to generate test cases. 178 4.3 Review Activities Supported in PrUDE 4.3 Review Activities Supported in PrUDE A reviewer can check well-formedness and validity arguments using the PrUDE tool. This is done by importing the XMI file generated from UML models. PVS semantic models are then automatically generated based on the XMI file. Business rules in OCL are manually translated into PVS and systematically integrated with the PVS semantic models using the property editor. The model is then checked based on wellformedness rules, whereas type-correctness is checked by invoking the PVS type-checker in a batch mode. Finally, invoking the PVS theorem prover checks every system property. Figure 7 shows a snapshot of a PVS specification automatically generated Figure 7: Semantic Model generated for the UML Statechart Using the PrUDE tool from the UML statechart diagram shown in Figure 4 using the PrUDE tool. The lower window is a log area where reports generated from PVS tools are displayed. In order to check validity of the specification, the reviewer states and checks conjectures based on system requirements. The essential conjectures suggested by reviewers in the feasibility study are security requirements for authorization, authentication, accountability, and availability. We discuss in the following an example of a conjecture proposed by the reviewers, which was not in the initial list of properties. It enabled us to discover a subtle flaw that will be discussed below. The conjecture is stated as follows: 179 4.3 Review Activities Supported in PrUDE Property 1: A user cannot perform logout operation unless (s)he is connected. The reviewer invoked the PVS prover to discharge the conjecture. The proof was unsuccessful and resulted in a counterexample as a PVS debugging message: {−1} dsubvertex(Connected)=emptyset {−2} State(Connected) {−3} dsubvertex(Connected)=emptyset {−4} defaultState(Connected) = Connected [−5] tr!1= (# source := Connected, trigger := logout, guard := EmptyC, effect := clearSession, target := Connected #) [−6] mode(v1!1) [−7] connection(v1!1) [−8] pred(EmptyC)(v1!1) {−9} mode(v2!1) [−10] logout(v1!1) [−11] connection(v2!1) |−−−−−−− Rule? The debugging message is expressed in the form of unproved sequent with several antecedents and no consequent to be proved. In such a case, either there exists a conflict in the antecedents, or the antecedents are not sufficient to prove the sequent. Lines {−1} to {−4} refer to the simple state Connected (see Fig. 4). Line [-5] refers to a transition (labelled internally) tr!1 whose source and target is the state Connected, triggering method logout, empty guard condition, and action clearSession. This corresponds to the self-transition associated with the state Connected. Lines [-6] to [-11] refer to the firing of transition tr!1. At this stage, the reviewer inferred that the firing of transition tr!1 leads to an inconsistent state, and decided to closely examine the transition and its meaning as defined in the statechart diagram. In a normal execution, the concurrent state Connecting contains a logical inconsistency. If we follow the processing of a user request to connect to the Document Server, we can determine the following operations: • The thread responsible for user connection starts in the Idle state. • If the thread receives login request from unconnected user, it remains in the Idle state. • If the thread receives a login request with valid user ID and password from unconnected user, it enters the Connected state. • After a user is connected, the thread responsible for user connections returns to Idle state. 180 5. Test Data Generation and Review • When the thread in the Idle state receives a logout request from a connected user, it handles the request and remains in the Idle state. These operations seem consistent with a running server. The transition that is logically inconsistent when compared to the implementation of the system is, as indicated by the counterexample, the transition from the Connected state to itself triggered by a logout request. In reality, a logout request from a user who is not connected should not be processed. This problem could occur, if, for example, the implementation code did not properly set the connection property of a client after it has successfully logged in; rather, it is set before completion of the connecting code. Although the detected error might seem trivial, it is an example of typical errors that can easily be skipped during manual review. Remarks: A similar irregularity arose in an application with two threads, one for handling local requests, and the other for handling client connections. The problem involves actions of starting, stopping and restarting a thread that handles client connection. The logical inconsistency became visible when the administrator stopped the server thread and attempted to restart it. This problem was not discovered during the initial testing, since it was assumed that the user wants to change ports when starting and stopping the service. However, the inconsistency was discovered when the administrator shut down the server and a client was connected successfully. After several hours of debugging, the problem was found to be a missing statement that releases the port the server was bound to when the server is shut down. When the server is started, it is bound to a specific port, say port 5555, and clients request connections to this port. When the server is stopped, all sockets are terminated properly and all resources are freed; clients should not be able to connect. While the server thread was down, the server socket bound to port 5555 was not released, consequently creating an orphaned thread that the main application had no reference to. The solution: to add a statement that closes the server socket and free the port. To summarize, the fact that the application successfully handles login requests when the server is stopped is a logical error. This is similar to the scenario where the system could successfully handle logout requests from a client that had not yet completed connection. We could make this problem more apparent by renaming the state Connected in the statechart diagram by ConnectingClient, or something similar, to indicate that the connection process takes some time. 5 Test Data Generation and Review In spite of the progress that has been made in improving the level of automation of testing, test case generation still requires significant manual input, making the process time consuming and error prone, thereby raising the need for thorough checking of 181 5.1 Model-based testing test data. We discuss our approach to test data generation and review based on UML models. 5.1 Model-based testing Our goal is to use UML models as the basis of program testing. There are a number of publications reporting work done in the area of specification-based testing [25, 9, 39, 4]. The objective of testing a program is not only to check that it behaves properly, but also to check that it behaves as originally required. The latter is crucial, as it is possible to write a program without error, but which behaves differently from what was stated in user requirements. Using a formal model as a basis of test case generation contributes significantly towards that goal. Our testing approach consists of validating the UML model based on its formal semantics and system requirements. When a valid UML model is obtained, we generate test cases from the various constraints associated to model elements, e.g. classes, states, and operations. UML consists of nine standard diagrams, each of which may be used for testing to various degrees and for different purposes. We describe the transition test strategy based on statecharts and refer interested readers to [21] for test strategies based on other UML diagrams. 5.1.1 Transition-based Testing A transition test model consists of the set of transitions associated with a statechart diagram. It allows the generation of test cases at the method and class levels. An event in a UML statechart diagram corresponds to a method call. The activation of a transition involves two predicates, enabled and fired, as defined in section 4.1. The predicate enabled defines the enabling condition for the transition, whereas the predicate fired specifies the resulting condition after the transition is completed. This pair of predicates can be considered as a pair of pre- and postcondition associated with the corresponding method, and can be used to generate suitable test cases for the method. The characteristic formula associated with each pre-postcondition pair is as follows: ∀v : V • ∃v1 : V • enabled(e, tr, v) ⇒ f ired(tr, v, v1) where tr is a transition, e a trigger event, and V a record type that encapsulates all system variables. Since the same method can be called several times, a transition provides only a partial pre-postcondition. The global pre-postcondition is obtained from the conjunction of the partial pre-postconditions. Test cases are generated from a partial pre-postcondition pair by decomposing the precondition into disjunctive normal form (DNF), yielding elementary sub-expressions. Next, the sub-expressions are refined into executable expressions from which suitable test cases are generated using the domain test matrix technique. The PrUDE tool automatically decomposes and generates the abstract expressions, whereas the refined 182 5.1 Model-based testing expressions are manually generated. PrUDE also provides a spreadsheet-like table that assists users in applying the domain test matrix technique. For Java programs, it provides a test execution component to which the generated test cases may be submitted and executed automatically. 5.1.2 Example of Test Data Generation we present a testing of the method login() of the class DocProvider (see Fig. 4) using the transition test strategy. There are two transitions that involve the method login(): a transition from the state Idle to the state Connected, and the self transition on the state Idle. Based on the predicates associated with the elements of each transition (see Section 3.3), we identify two pre-postcondition pairs associated with the method login(): DocProvider::login(uid:string, pwd:string) pre: predIdle() and predAccept() post:predConnected() and predCreateSession() DocProvider::login(uid:string, pwd:string) pre: predIdle() and not predAccept() post: predIdle() Test cases are generated from every pre-postcondition pair using an extended form of domain analysis of object variables, exploiting decision trees and class attribute structures. The conventional domain analysis technique is only appropriate for expressions involving primitive variables. For instance, from the first pre-postcondition pair above, the PrUDE tool generates the following abstract DNF expression consisting of five sub-expressions: dp:DocProvider, sp:SecurityProfile, uid,pwd:string (1) dp.mode=true (2) dp.connection=false (3) dp.securityDirectory.includes(sp) (4) sp.owner.userid=uid (5) sp.owner.password=pwd Six test cases are generated from these expressions. A test case is specified by assigning values to input variables and specifying expected output. The input variables correspond to the state variables and the parameters of the method under testing. Only input values that make the precondition true are considered. Expected output, corresponding to the postcondition, is always equal to true in that case. We describe an example of a test case generated from a successful login of a user with ID alex and password camry. The test case, labelled tc1, is given as follows: tc1 = (Input=(dp1,sp1,uid"alex",pwd="camry"); Output=True) where dp1 and sp1 are instances of DocProvider and SecurityProfile, respectively: 183 5.2 Test data review dp1:DocProvider, sp1:SecurityProfile, ac1: AccessRight dp1 = (mode=True, connection=False, service=True, securityStatus=False, securityDirectory={sp1}) sp1 = (owner=p1, right={ac1}) ac1 = (read=True, modify=False, create=False, delete=False, addfriend=True, addDoctor=True) p1 = (name="Alex", userid="alex", password="camry", age=20, address="40 Bay St", ssn=1234567). 5.2 Test data review The review of test data consists of reviewing expressions used to generate test cases, and checking that the coverage criteria corresponding to the strategies used are met. The coverage criteria considered at this level are specification-based testing criteria. For instance, for the transition test strategy, we define three coverage criteria that must be checked manually by the reviewer: transition coverage, DNF coverage, and condition coverage. The transition coverage criterion is defined in terms of the state machine of a class. A tester should test every transition in the state machine at least once. Transition coverage is analogous to statement or branch coverage at the code level. The precondition coverage criterion requires that every DNF involved in a precondition is covered by at least one test case. A DNF consists of one or more elementary boolean conditions. A DNF criterion is based on the rationale that each condition should be tested independently without interference from other conditions. Thus, the test set must include at least one test case that makes all conditions true and test cases that falsify each condition at least once. Test case expressions, e.g. pre- and post-conditions, generated using the PrUDE tool are abstract expressions derived from the specification. In order to generate test cases, the tester needs to provide concrete implementation for these expressions in the target programming language. For instance, Java expressions corresponding to the five DNF sub-expressions for the method login() given above are as follows: mode==true (1) connection==false (2) securityDirectory.contains(profile) (3) uid.equals((profile.getOwner()).getUserid()) (4) pwd.equals((profile.getOwner()).getPassword()) (5) Although the expressions look very simple, they are still error prone. The role of the reviewer is to check whether they are correct with respect to their specification, i.e. the abstract expression. 184 6. Related Work 6 6.1 Related Work On Using Correctness Arguments A great deal of research work has been done on the use of correctness arguments in structured reviews. Closely related to our approach is the work of Parnas and Weiss on Active Design Review (ADR) [37]. The ADR approach is guided by questionnaires provided to the reviewers by the authors of review documents. Based on the ideas of the questionnaire, Britcher [5] later proposed an approach that combines the strength of formal correctness arguments with informal structured review. Four correctness arguments, namely, algebra, topology, invariance and robustness are examined using the questionnaire based on the ADR approach. In our case, we define additional arguments that broaden the scope of the review process, thereby increasing the number of potential defects that may be discovered and increase the effort required. In contrast to our approach, the cleanroom process [30], developed at IBM, puts a strong emphasis on interactive proof-checking, which is used as an alternative to unit testing. The software is developed and validated incrementally through successive refinement steps. The stepwise refinement that contributes significantly towards the efficiency of the cleanroom process is a source of its main weaknesses because of the inherent complexity of formal verification. Scenario-based reading (SBR) [3] is an extension of ADR that uses guided scenarios to describe concretely how to find specific kinds of defects, and what to look for in the exhibits. Through a controlled experiment, Laitenberger et al. [26] have established that perspective-based reading (PBR), a particular kind of scenario-based reading, is more efficient than checklist-based reading (CBR) for detection of defects. PBR supports the reading of a document from the perspective of different stakeholders, e.g. designer, implementer, tester, etc. Their experimental material is based on UML and emphasizes the importance of defining new inspection approaches for object-oriented models, particularly the graphical ones [27]. Our work is closely related to this approach because the foundation of our review techniques is the ADR. However, their approach focuses on checking solely completeness and consistency of the UML diagrams. No information is given regarding the checking of arguments such as model validity. Our framework allows the reviewer to express conjectures that can be translated into formal expressions and checked against the model to evaluate its validity. In [1, 10], Dunsmore et al. propose a systematic, abstraction-driven technique for inspection of object-oriented code. The approach enables inspectors to read the code systematically and create an abstract specification for each method as they read it. Our approach can be considered as a combination of the abstraction-driven and use-case techniques supported with formal verification. The approach proposed by Thelin et al. [40] is similar to ours as the idea of inspections is organized around analysis models such as use cases and sequence diagrams. 185 6.2 On Using Visual Notations They conducted an empirical study on usage-based reading using use cases as units of review. Two groups of reviewers, one reviewing a set of use cases prioritized in terms of their importance, and the other reviewing the same set of use cases in random order, participated in the study. It is concluded that reviewers in the group that reviewed the prioritized use cases are more efficient in detecting faults. 6.2 On Using Visual Notations Integrating semi-formal visual notations and formal methods has been an important research topic, and a significant amount of work has been performed. Heimdahl et al. [20], defined a formal semantic for a visual language called Requirements State Machine Language (RSML) and used it for analyzing consistency and robustness of requirement specifications. UML statecharts that is used in our platform and the RSML are very similar: both languages originate from Harel statecharts. Our work, however, uses other UML notations, such as sequence and class diagrams in addition to statecharts, thus allowing description of a wider range of system properties. Easterbrook et al. [11] reported on three case studies consisting of a selective and lightweight application of formal methods to system analysis. We adopt a similar principle and use the UML design models as a basis of implementation. Formal semantics generated at the back-end are used for rigorous analysis to improve the quality of the baseline model. UML has established itself as the most popular visual modeling notation since its inception. Not surprisingly a significant amount of research work has been undertaken towards improving the precision of UML by providing a mathematical basis to its underlying concepts. Since the inception of UML, several researchers have been working on its formalization. In most cases, the work exclusively focuses on a specific subset of the UML notations, e.g. on static structural models such as class diagrams and object diagrams [16, 13], or on dynamic behavioral models like sequence diagrams [6, 8] and statechart diagrams [34, 28]. Most of the work on UML formalization focuses on semantic definition at a general and abstract level but does not provide any concrete guidance for practitioners. In our case, we provide more detailed and concrete semantic definitions for UML notation, along with guidelines for their application to practical development process. Our formalization effort is tool-centered and application-oriented. In this respect, our work is very close to that of Betty Cheng et al. who have proposed, and used in practical settings, a general framework for formalizing a subset of UML diagrams based on a homomorphic mapping between corresponding meta-models and a corresponding tool named Hydra [33]. Model-based verification is a process for identifying and correcting errors. It integrates established modeling techniques, formal methods, and model checking approaches into a systematic software engineering practice. Gluch et al. [19] present a 186 7. Conclusion and Future Work model-based verification technique for upgrading dependable systems. Engels et al. [12] propose a similar approach for verification and validation of dynamic properties of concurrent systems by translating UML models into semantic models in CSP and analyzing them using the model checker FDR [15]. A new trend of model-based verification tools, named active software tools, use artificial intelligence techniques to assist and guide developers. WayPointer is an agentbased environment developed by a company named Jaczone that provides contextbased support to designers in checking consistency and managing traceability among UML models [22]. Liu and colleagues introduced a rule-based environment that can be integrated with UML CASE tools to provide on-the-fly inconsistency management [32]. This enhances the basic consistency-checking scheme provided by existing UML CASE tools. In [7], a constraint checker (CC) for OCL expressions is presented. Constraints are translated into well-defined modeling rules, representing the knowledge base of an expert system, which are used to verify UML models. In the future, we automate several tasks in the PrUDE tool using active technology (see Table 1). Another aspect of model-based verification that has been the focus of intensive research is the specification-based testing. Briand et al. [4] propose a model-based testing methodology for object-oriented systems and discuss testability and automation issues. Test requirements are derived from analysis models and the benefits of using early artifacts are highlighted. Stocks et al. [39] developed a testing framework based on a similar approach. Doong and Frankl [9] propose the ASTOOT approach to test object-oriented programs by using algebraic specifications. Kung et al. [25] present an approach in which state machines are constructed from source code by combining reverse engineering and symbolic execution methods. We emphasize not only the importance of specification-based testing, but also argue that the model used for test case generation is subject to errors, and hence we suggest formal validation of the model and manual review of test expressions generated from the model before using them for test case generation. 7 7.1 Conclusion and Future Work Conclusion Though review can be quite effective in finding deficiencies and bugs in program codes, it should not be considered as a replacement for other techniques such as formal verification and testing. For instance, testing is more practical than review for verification tasks related to system integration, performance analysis, reliability assessment or user interface validation. Formal reasoning may significantly improve the level of precision and rigor of a software product, but both testing and formal reasoning may involve high costs. This work builds on the strengths of techniques of developing an efficient 187 7.2 Future Work and cost-effective integration of V&V framework with structured review. We show how formal analysis can be used effectively to supplement and widen the scope of structured review. The aim of developing the PrUDE tool is to increase the level of automation of the analysis process in order to reduce the underlying difficulties and costs. We argue that informal structured review is a solution to the aspects of rigorous analysis that cannot be automated. However, for highly critical aspects, the cost of performing rigorous analysis is justifiable. 7.2 Future Work The current version of the PrUDE tool has certain limitations. It expects the developers and reviewers to be familiar with the OCL, and to use this notation in expressing business rules and conjectures. In the future, the PrUDE tool will be extended with automatic translation of OCL expressions into PVS. The format of error messages from a failed proof checking is another major shortcoming of the current version of the tool. These issues are mainly implementation-related that will be addressed in the future. The resulting PVS log messages use the vocabulary of the UML modeling elements in the system model. In the future, we will implement an intelligent parser that interprets the PVS error messages and translates them into understandable text. This is highly non trivial but doable for some very restricted classes of properties in specific settings, e.g. safety properties expressed as an invariant on a particular state chart. Another consideration: increasing the level of automation of model-based verification. In the future, we will continue to investigate how this can be achieved for some of the most error-prone steps of the development process. One such area that will retain our immediate attention is the refinement process, which is one of the most complex aspects of design process. The formal semantics proposed in this work is based on the standard UML semantics defined by the OMG. It may happen, however, that the semantics is understood by the designer differently from the proposed semantics. This may lead to inconsistencies between the requirements as understood by the designer and the formal semantics generated by the PrUDE tool. Expressing the requirements in the form of conjectures and checking them against the generated semantics highlight the inconsistencies. In the future, we aim at identifying some mechanisms that will allow systematic tracking of such kinds of inconsistencies. These mechanisms would be implemented as an extended feature of the intelligent error reporting system that will be developed. The proposed framework is fully integrated with various steps of the software life cycle with a focus on model verification and review. The current framework, however, does not support code inspection. In the future, we will also investigate how the PrUDE tool can be extended with code inspection capabilities. 188 7.2 Future Work References [1] A. Dunsmore, M. Roper and M. Wood. The Development and Evaluation of Three Diverse Techniques for Object-Oriented Code Inspection. IEEE Transactions On Software Engineering, 29(8), August 2003. [2] D. B. Aredo. A Framework for Semantics of UML Sequence Diagrams in PVS. Journal of Universal Computer Science, 8(7):674–697, July 2002. [3] V. Basili. Evolving and Packaging Reading Technologies. Systems and Software, 38(1):3–12, 1997. [4] L. Briand and Y. Labiche. A UML-Based Approach to System Testing. In M. Gogolla and C. Kobryn, editors, Proc. of 4th UML International Conference (UML2001), volume 2185 of LNCS, Toronto, Canada, Oct. 2001. [5] R. N. Britcher. Using Inspections to Investigate Program Correctness. IEEE Computer, November 1988. [6] M. Broy. On the Meaning of Message Sequence Charts. In ECOOP’97, Mehmet Aksit, Satoshi Matsuoka (ed.), volume LNCS 1241, Jyväskylä, Finland, June 1997. Springer Verlag. [7] G. Caplat and J.-L. Sourouille. Model Mapping in MDA. In Proceedings of the Workshop WISME UML’2002, Dresden, Germany, 2002. [8] W. Damm and D. Harel. LSC’s: Breathing Life into Message Sequence Charts. In Formal Methods for Open Distributed Systems (FMOODS’99), Florence, Italy, February 15-18, 1999. [9] R.-K. Doong and P. G. Frankl. The astoot approach to testing object-oriented programs. ACM Transactions on Software Engineering and Methodology, 3(2), 1994. [10] A. Dunsmore, M. Roper, and M. Wood. Systematic object-oriented inspection-an empirical study. In Proc. of 23rd Int’l Conf. on Software Eng. (ICSE’01), pages 135–144. IEEE Computer Society, May 2001. [11] S. Easterbrook, R. Lutz, R. Covington, J. Kelly, Y. Ampo, and D. Hamilton. Experiences Using Lightweight Formal Methods for Requirements Modeling. IEEE Trans. on Soft. Eng., 24:4–14, Jan. 1998. [12] Gregor Engels, Jochen M. Kster, Reiko Heckel, and Marc Lohmann. Model-Based Verification and Validation of Properties. In Roswitha Bardohl and Hartmut Ehrig, editors, Electronic Notes in Theoretical Computer Science, volume 82. Elsevier, 2003. [13] A. Evans. Reasoning with UML Class Diagrams. In the Proc. of WIFT’98. IEEE Press, 1998. [14] M. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM Systems Journal, 15(3):182–211, 1976. [15] Formal Systems Europe (Ltd). Failures-Divergence-Refinement: FDR2 User Manual, 1997. [16] R. B. France, J.-M. Bruel, M. Larrondo-Petrie, and M. Shroff. Exploring the Semantics of UML Type Structures with Z. In H. Bowman and J. Derrick, editors, the Proc. 2nd IFIP Conf. Formal Methods for Open Object-Based Distributed Systems (FMOODS’97). Chapman and Hall, London, 1997. [17] R. B. France, A. Evans, K. Lano, and B. Rumpe. The UML as a Formal Modeling Notation. Computer Standards & Interfaces, 19:325–334, 1998. [18] T. Gilb and D. Graham. Software Inspection. Workingham: Addison-Wesley, 1993. [19] D. P. Gluch and C. B. Weinstock. Model-Based Verification: A Technology for Dependable System Upgrade. Technical Report CMU/SEI-98-TR-009, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, Pa., USA, Sep. 1998. [20] M. Heimdahl and N. Leveson. Completeness and Consistency Analysis of State-Based Requirements. IEEE Trans. On Soft. Eng., 22:363–377, November 1996. [21] Ye Hong. UML-based Testing of Object-Oriented Programs, July 2003. Master Thesis, Dept. of Electrical and Computer Engineering, University of Victoria. 189 7.2 Future Work [22] I. Jacobson. A Resounding Yes to Agile Processes-but also to more. Cutter IT Journal, 15(1), January 2002. [23] C.B. Jones. Systematic Software Development using VDM. Prentice-Hall, Englewood Cliffs,NJ, 2nd edition, 1990. [24] P. Kruchten. The Rational Unified Process. Addison Wesley, Sept. 1999. [25] D.C. Kung, N. Suchak, J. Dao, and P. Hsia. On Object State Testing. In IEEE COMPSAC’94 Conference, Feb. 26 1994. [26] O. Laitenberger, C. Atkison, M. Schlich, and K. El Emam. An Experimental Comparison of Reading Techniques for Defect Detection in UML Design Documents. Systems and Software, pages 183–204, 2000. [27] O. Laitenberger, C. Atkison, M. Schlich, and K. El Emam. Using Inspection Technology in Object-oriented Development Projects, June 2000. Technical Report NRC/ERB-1077. [28] D. Latella, I. Majzik, and M. Massink. Towards a Formal Operational Semantics of UML Statechart Diagrams. In the Proc. of FMOODS’99, Florence, Italy. Kluwer, February 15-18, 1999. [29] M. Lawford, P. Froebel, and G. Moum. Practical Application of Functional and Relational Methods for the Specification and Verification of Safety Critical Software. Lecture Notes in Computer Science, 1816, 2000. [30] R. C. Linger. Cleanroom Process Model. IEEE Software, 11(2):50–58, March 1994. [31] M. Y. Liu. PVS Proof Patterns for UML-based Verification, October 2002. Master Thesis, Dept. of Electrical and Computer Engineering, University of Victoria. [32] W.Q. Liu, S. Easterbrook, and J. Mylopoulos. Rule-based Detection of Inconsistency in UML Models. In L. Kurniaz, G. Reggio, J. Sourouille, and Z. Huzar, editors, Proceedings of the Workshop on Consistency Problems in UML-based Software Development-UML’2002, pages 106– 123, Dresden, Germany, 2002. [33] W.E. McUmber and B. Cheng. A General Framework for Formalizing UML with Formal Languages. In Proc. of IEEE International Conference on Software Engineering (ICSE01), Toronto, Canada, May 2001. [34] E. Mikk, Y. Lakhnech, and M. Siegel. Hierarchical Automata as Model for Statecharts. In K. Ueda R. K. Shyamasundar, editor, the Proc. of Asian Computing Science Conference (ASIAN’97), volume 1345 of LNCS, pages 181–196. Springer Verlag, December 9-11, 1997. [35] OMG. OMG Unified Modeling Language Specification, version 2.0, June 2003. OMG standard document. [36] S. Owre, J. Rushby, N. Shankar, and F.V. Henke. Formal Verification for Fault-tolerant Architectures: Prolegomena to the design of PVS. IEEE Transactions on Software Engineering, 21(2):107–125, February 1995. [37] D.L. Parnas and D. M. Weiss. Active Design Reviews: Principles and Practices. Journal of Systems and Softwares, pages 259–265, 1987. [38] R. W. Selby and V. R. Basili. Cleanroom Software Development: an Empirical Evaluation. IEEE trans. on Sof. Eng., 13(9):1027–1037, 1987. [39] P. Stocks and D. Carrington. A Framework for Specification-Based Testing. IEEE Trans. On Soft. Eng, 22(11):777–793, 1996. [40] T. Thelin, P. Runeson, and B. Regnell. Usage-based Reading - an Experiment to Guide Reviewers with Use Cases. Journal of Information and Software Technology, 43(15):925–938, 2001. [41] I. Traoré. An Outline of PVS Semantics for UML Statecharts. Jounal of Universal Computer Science, 6(11):1088–1108, 2000. [42] I. Traoré. An Integrated V&V Environment for Critical Systems Development. In the Proc. of 5th IEEE International Symposium on Requirements Engineering, Toronto, Canada, August 2001. 190 7.2 Future Work [43] I. Traore. A Transition-based Testing Strategy for Object-Oriented Programs. In Proc. of ACM Symposium on Applied Computing (SAC03), Melbourne, Florida, USA, March 9-12, 2003. [44] I. Traoré, D. B. Aredo, and H. Ye. An Integrated Framework for Formal Development of Distributed Systems. In Proc. of ACM Symposium on Applied Computing (SAC03), Melbourne, Florida, USA, March 9-12, 2003. [45] I. Traoré, A. Jeffroy, M. Romdhani, and A.E.K. Sahraoui. An Experience with a Multiformalism Specification of an Avionics System. In the Proc. INCOSE 98, Vancouver, Canada, July 25-31, 1998. [46] E. van Emden and L. Moonen. Java Quality Assurance by Detecting Code Smells. In the Proc. of 9th Working Conference on Reverse Engineering (WCRE’02), pages 97–108, Richmond, Virginia, USA, October 2002. IEEE Computer Society Press. [47] J. B. Warmer and A. G. Kleppe. The Object Constraint Language: Precise Modeling with UML. Addison Wesley Longman Inc., 1999. 191 192 Appendix H Formal System Development Using Method Integration: a Case Study D. B. Aredo and O. Owe Publication: D. B. Aredo and O. Owe: Formal Development Using Method Integration: a Case Study, Research Report no. 308, Department of Informatics, University of Oslo, August 2004. Formal System Development Using Method Integration: a Case Study∗ Demissie B. Aredo1 and Olaf Owe2 1 Norwegian Computing Center P. O. Box 114 Blidern, N-0314 Oslo, Norway. 2 Department of Informatics, University of Oslo P. O. Box 1080 Blidern, N-0316 Oslo, Norway. Abstract In this paper, we demonstrate feasibility of a development framework that integrates semi-formal graphical modeling techniques with formal methods (FMs). In particular, the framework integrates the Unified Modeling Language (UML) with the PVS environment to exploit the synergy between them. System descriptions are given in the graphical UML notations and translated into PVS specifications based on semantic definitions, which we have proposed for the UML notations. The resulting semantic models are rigorously analyzed using the PVS toolkit. The translation of UML models into PVS specifications is automated by the PrUDE tool. This work contributes towards the improvement of the use of FMs in the development of highly dependable systems in industrial settings and narrows the gap between the theoretical foundation underlying FMs and their practical application. Keywords: Formal Methods, UML, OCL, OUN, PVS, Method Integration 1 Introduction Semi-formal object-oriented analysis and design (OOAD) techniques such as the UML (Unified Modeling Language) [28] have become quite popular among software developers. The structuring mechanisms, and intuitively appealing graphical notations are among the features that have contributed to their acceptance. Their major limitation ∗ Published as Research Report No. 308, Department of Informatics, University of Oslo, August 2004. 193 1. Introduction in the context of critical systems development is, however, the lack of precise semantic definitions for their notations - a significant barrier to their application to critical system development in industrial settings. A greatly improved development process can be obtained if tools are augmented with deeper semantic analysis of the graphical models [45]. On the other hand, formal methods (FMs) [46] have enormous potential in the development of highly dependable systems, and are increasingly finding practical uses due to recent development towards automated tools. FMs are development approaches based on a mathematical foundation allowing precise and rigorous specification of system requirements, and ensure that the final software product meets the initial expectations of the customer in terms of functionality as well as quality. Despite the rigor, practical usability of formal verification approaches is limited due to their esoteric nature. A framework that integrates a semi-formal modeling language, namely the UML, and a formal verification environment, namely the PVS, and a supporting tool is the focus of this paper. The main objective of formal development methods is to specify system behavior and desired functionalities precisely, and verify that the system meets the original requirements. Formal specification is a basis of a meaningful and rigorous analysis of system properties. Some verification environments provide specification languages tailored towards a specific application domain together with a simulator, a model checker or both, e.g. LOTOS [18], and the SPIN system [16]. Due to features inherent in distributed systems, e.g. concurrency, dynamic reconfiguration, and complexity, a simulation can examine only a fraction of possible system runs. Techniques related to model checking, on the other hand, provide complete exploration of all possible runs exhibited by a finite-state machine describing the system. Model checking has become very popular because experiences indicate that checking all runs is more effective in finding bugs [35] while requiring little or no insight in the formalism, and no user interaction is required. Model checking can also be complemented with interactive proof-checking if necessary. A major limitation of model checking is that the state space must be finite even though advances involving symbolic execution have been made. The benefits of introducing FMs into a development process includes: - Improved understanding of system requirements and reduced errors and omissions; - Possibility to check consistency and completeness of system specifications, and prove that an implementation conforms with the specifications; - Semantically-based CASE tools can be built to assist developers in analysis, design, implementation and program debugging. They may also support animation and execution of formal specifications to provide a prototype of the system; and 194 1. Introduction - Formal specifications are used as guidelines in the identification of appropriate test cases and their evaluation. Despite all these benefits, FMs still have difficulties in breaking through the software industry. Very few organizations or projects are using FMs. A number of reasons have been put forward as to why the formal development methods have not been widely used in the software industry [36]: - FMs are considered esoteric, due to the lack of training for software engineers in the discrete mathematics and logic at the required level. Moreover, customers are unlikely to be familiar with FMs, and hence they are not willing to pay for the development activities they cannot monitor; and - Lack of tool support: most of the effort in research on formal methods focused on the development of languages and their mathematical underpinning and less effort has been devoted to tool support. As argued by Sommerville [36], the major challenge facing the software community is not developing new techniques and methods, but transferring the existing software engineering research results into the software industry. To address this issue, a number of strategies for introducing FMs into software development process have been proposed by the research community. Most of the strategies [11, 24, 42] advocate a lightweight and selective application of FMs using visual modeling notations such as the UML [28] as a front-end. FMs are used solely for analyzing specific aspects or properties of a system. The baseline specification used to conduct further development activities, is created and maintained using the graphical notations familiar to and popular among software developers. In [41, 39], we proposed a development framework integrating the UML specification techniques [28, 34] with the Prototype Verification System (PVS) [30] to support formal development of distributed systems. The integrated approach has the following major contributions to the software engineering process: • A formal specification of syntactic well-formedness constraints for UML in the PVS specification language, which significantly improves the acceptance of FMs among software developers by enhancing the development process with OOAD techniques, and supported by a CASE tool. • Defining formal semantics of graphical modeling language addresses the limitations of OOAD techniques in the context of the development of highly dependable systems by making UML models amenable to formal analysis. In the sequel, we demonstrate practical usability of the integrated approach by presenting an example of a security-critical system. Major components and concepts of the framework and a supporting CASE tool are revisited to make this paper self contained. 195 1.1 Outline of the Report 1.1 Outline of the Report The rest of the report is outlined as follows. In Section 2, major aspects of the development framework and the supporting CASE tool, namely, the PrUDE tool are briefly revisited in order to make the report self-contained. Our focus is mainly on concepts and notations that might be encountered in later sections. In Section 3, we demonstrate practical usability of the integrated platform and the supporting tool by presenting an example of the development of a security-critical system. Finally, in Section 4, we summarize, draw some conclusions and discuss future research issues. 2 The Integrated Platform Revisited The development of critical systems such as the e-banking, and access control systems requires high-level of rigor and reliability. Integrating formal methods (FMs) into a software development process improves software quality and reliability by revealing subtle errors that may not be, otherwise, discovered before it is too late and too expensive to fix. It also increases productivity by supporting development of semantically based tools. Usually, developers describe different aspects of a system, using several description techniques and notations. For instance, one might want to describe the functional behavior of a system as a composition of the functional behaviors of the modules constituting the system. Moreover, one might want to specify structural relationships between the modules, e.g. modules that may directly communicate. At the time of this writing, there is no single description technique or notation that conveniently can capture complete behavior of a system from different view points, and at the level of rigor necessary for reasoning about reliable systems. Hence, integrating several specification techniques, notations, and formalisms is necessary. When several description techniques and notations are involved in a development platform, using a common underlying semantic domain is very essential. This significantly reduces the effort to check consistency across language boundaries, by allowing reasoning about system properties in a uniform manner. As mentioned in the previous section, when it comes to practical applications, both the semiformal OOAD techniques and the FMs have inherent strengths and limitations. We argue that a development platform that pulls together strengths of FMs and OO graphical modeling technique significantly improves the reliability of critical systems. The main objective of method integration approach is to obtain a development framework and a supporting tool that enhance application of FMs in an industrial setting, and at the same time make the OOAD techniques amenable to rigorous analysis. 196 2.1 Notations and Formalisms 2.1 Notations and Formalisms In the rest of this section, we present a brief overview of the notations and formalisms involved in the integrated platform. We do not present a complete tutorial on the notations, instead we focus only on key features that will be encountered in later sections. For detailed presentations, interested readers should refer to respective relevant literatures. 2.1.1 The Unified Modeling Language The Unified Modeling Language (UML) [28, 34] provides a set of standard notations and modeling techniques for specifying, visualizing, and documenting artifacts of software systems. UML supports a highly iterative, distributed software development process, where every stage of the software life cycle, e.g. requirement analysis, and design, can be specified by using a combination of different description techniques. Our work is based on UML 1.3. At the time of this writing, there is no standard formal semantics for UML notations, and this makes development of semantically-based CASE tools a difficult task. Most tool vendors use in-house semantic definitions for UML notations. In the UML standard [28] a semi-formal semantic guideline is provided for developers of UML tools. Static structural system properties can be specified by UML diagrams such as class, and component diagram, whereas dynamic properties can be captured by diagrams such as the interaction diagrams, statecharts, and activity diagrams. An interaction is specified by a sequence diagram consisting of a list messages exchanged between the interacting objects involved in the interaction. A sequence diagram is a particular type of diagram describing a specific pattern of interaction between objects in terms of messages exchanged as the interaction unfolds over time to effect the desired property. A message is a specification of a communication between objects, or an object and its environment, conveying information with the expectation that an activity will ensue. A sequence diagram specifies roles of the objects, i.e. sender or receiver, as well as the associated action that causes the communication to take place. However, it conveys a possible behavior rather than restricting all possible behaviors. UML sequence diagrams are efficient description technique for describing scenarios of systems with time-dependent functionality, like real-time applications. The simplicity of sequence diagrams makes them suitable for specification of intended behavior that can easily be understood by every stakeholder: customers, requirements engineers, and software developers alike [45]. We are interested only in externally visible properties of objects and ignore internal changes. We distinguish between send and receive events associated with each message when modeling the behavior of objects participating in the interaction specified by a sequence diagram. Hence, in a specification of a message, correspondence between 197 2.1 Notations and Formalisms the send and receive events constituting the message has to be established. In our framework, a message is interpreted as a pair of send and receive events. Hence, a sequence diagram is interpreted as a set of traces of events satisfying some specific properties, such as the causality and the general ordering requirements [3]. UML supports the notion of time (see [28, chap. 3, pp. 98]) and allows specification of the time when a message is sent and received. The notion of time can be captured by stamping events by the time of their occurrences. This sort of information is useful for expressing temporal properties of traces, e.g. the minimum time interval between the occurrences of two events. Stamping of events with global time is crucial, for example, to obtain the global history by merging traces of events by interleaving the events in temporal order of their occurrences. The resulting trace is a specification of the global history of the object under consideration. An object participating in an interaction is represented as a set of infinite and finite traces reflecting, respectively, non-terminating and terminating executions. For safety properties, finite trace semantics is sufficient to specify behavior of a system over a finite time interval. Hence, we define the semantics of a sequence diagram as a prefix-closed set of finite traces, and represented in the PVS-SL as sets of lists of events. 2.1.2 The Object Constraint Language The abstract syntax of UML constructs is given in terms of UML meta-models, using UML class diagrams enhanced with textual annotations. The graphical UML models are not expressive enough for precise and unambiguous specifications. There is a need for description of additional constraints on objects in UML models. In the UML standard [28], constraints on modeling elements are given as a set of well-formedness rules expressed in the Object Constraint Language (OCL) [44] complementing the English language. OCL is a specification language extension to the UML notation provided as a part of the UML standard since UML v1.3 [28]. OCL is an expression language that enables developers to formulate constraints and object queries in the context of UML models. OCL expressions are used to specify invariants attached to static structural elements such as classes and types, pre- and post-condition of operations and guards for state transitions. OCL is a declarative language, not a programming language, i.e. evaluation of OCL expressions does not have side-effects on the associated UML model. Consequently, it is not possible to write program logic or control-flow in OCL, or invoke processes or activate non-query operations within OCL. As a modelling language, all implementation issues, except their correctness, are out of the scope of OCL. Hence, unlike some other formal languages such as Z [37], OCL specifications (specially invariants) are not easily convertible into program code. However, in the development of larger systems heed to the implementation is needed as it would not be feasible to back off in the middle of 198 2.1 Notations and Formalisms the development and start coding from the scratch. A number of tools for parsing and checking syntax of OCL specifications are available, e.g. OCL tool [27] developed at the Dresden University of Technology, and Octopus [26] developed by Klasse Objecten. To integrate constraints into UML models, invariants, and pre- and postcondition are attached as comments to respective modeling elements. Constraints may, however, turn out to be quite complex, with the impact that they are often specified separately. The contextual modeling element is explicitly specified by the context clause. OCL is a typed language based on the first-order logic. Logical operators and universal quantifiers in the first-order logic, and set operations lead to a powerful expressive language. Besides user-defined model types (e.g. classes, interfaces) and predefined basic types (e.g. integer, real, boolean), OCL has the notion of object collection types (e.g. sets, bags, sequences). Several operations such as the arrow operation → are predefined on the object collection types. For example, consider the <<enumeration>> TransactionKind withdraw deposit transfer Transaction approvedBy kind: TransactionKind * amount : nat 1..* Employee name: string Figure 1: Partial Description of a UML Class Diagram partial description of a UML class diagram shown in Figure 1. The Transaction and Employee classes are related by an association with one association end called approvedBy. The following OCL expression specifies that each transaction of kind withdraw or transfer involving an amount of funds above $10000 must be approved by at least two employees. context Transaction inv: (self.kind = withdraw OR self.kind = transfer) AND self.amount > 10000 implies self.approvedBy->size ≥ 2 Let us briefly explain the parts of the above OCL expression. The class name following the keyword context specifies the class for which the invariant is defined. The keyword inv indicates that this expression is a specification of an invariant, i.e. the expression must always evaluate to true for each object of the context class. But, an invariant can be violated during an execution of an operation. In other words, an invariant must hold for an object when none of its operations is executing. The keyword self is optional and refers to the object for which the expression is evaluated. Attributes, operations, and associations of the object can be accessed by dot notation, e.g. self.approvedBy results in a set of objects of class Employee associated with the Transaction object for which the invariant is currently evaluated. 199 2.1 Notations and Formalisms The arrow notation (→) indicates that the collection of objects proceeding the arrow is manipulated by a predefined OCL operation following the arrow. For example, for a given collection c, the expression c→size() returns the number of elements in the collection. There is a point to be made about constraints and inheritance in object-oriented models. In object-orientation, it is a rule that classes at the lower level of an inheritance hierarchy are always more specialized and concrete than the abstract classes at the higher level. This principle continues to hold for constraints, in that a subclass may strengthen constraints inherited from its superclass. In other words, a subclass inherit constraints from its super class, and may have additional constraints. This may cause problems where classes are freely reused. Constraints are specification of conditions that should not be violated. But, OCL v1.0 does not describe the measure to be taken in case a constraint is violated. As OCL is an expression language, one may argue that action does not need to be taken, and the model will be in an invalid state. Kleppe et al [23], however, proposed an extension of OCL by action clauses. The action semantics and object query language definitions are among the main feature added to OCL v2.0 that is a part of UML v2.0. Semantics of OCL expressions are described informally in the standard document [28]. Richters et al [33] proposed a formal semantics for the OCL constructs. Several extensions of OCL are proposed in the literature. Flake et al [12] propose temporal extension of OCL that enables developers to specify behavioral state-oriented constraints and present a formal semantics of state-oriented constraints [13]. We have given a brief summary of basic concepts of OCL used in later sections, and refer interested reader to the latest proposal of OCL 2.0 language definition [43] for more details. 2.1.3 Motivation for Creating a more Expressive Language The main goal of the ADAPT-FT project is to develop a platform supporting precise modeling of systems that are distributed, object oriented, and open. We wished to address high level specification of such systems, as well as high level models and implementations, based on a semantical foundation enabling formal methods suitable for the setting of open distributed systems. In order to integrate well with UML (for obvious reasons) we deliberately used well known UML concepts, and developed a modeling language, which may act as a textual counterpart to more graphical languages, and with more expressiveness capturing complete behavior. The language, known as OUN, includes executable imperatives for high-level system implementation, as well as a non-executable sub-language for system specification purposes. A compiler from implementation in OUN to Java was developed, allowing execution of OUN programs as well as an executable operational semantics in Maude [8]. 200 2.1 Notations and Formalisms We wished to contribute to the research direction of developing observable specifications of components, allowing top-down design of components where a ”black box” specification of the observable behavior of aspects of a the component comes before the design of its inside structure. This is a development strategy recommended by theoreticians as well as practitioners; however, according to state of the art it seems that the questions of how to formulate behavioral specifications, and how to integrate them into an object oriented setting, are not quite settled – at least, when considering specification methods understandable for programmers without special mathematical training. In contrast, the state based style of specifying components requires the definition of a state-space within the components and requirements specifications can then be given by means of invariants expressed in, say, first order logic or by means of temporal requirements expressed in temporal logic. OCL is oriented towards specification of invariants, pre- and post-conditions by means of a language built upon first order logic (with some adjustments). In particular, it does not support specification of observable behaviors of objects and components. We therefore found it interesting to develop OUN [29], allowing observable specification of (component) interfaces, supporting aspect oriented specification, as well as specification of assumed or required environmental behaviors; along with implementation of interfaces through (component) classes defining state space, invariants as well as imperative implementation of methods. In the language, a component is captured by an object of such a class, equipped with a local processor, and a local ”run” method. Distribution is enhanced by facilities for asynchronous communication, and object orientation is maintained by staying within a generalization of remote method invocation. High-level language constructs for programming of processor release points and passive waiting construct, through nested guards, allow components to dynamically change from active to reactive behavior, and give a reasonable efficiency control at a high level. In order to support openness such as dynamic reconfiguration, a dynamic class construct is provided, allowing software components to be upgraded during execution. Thus OUN may be used both for specification purposes as well as (high level) implementation purposes. The language may be seen as an extension of the basic mechanisms of OCL, through the OUN mechanisms for class level reasoning, extended to black box specifications of observable behavior of aspects of components. In OUN, behavioral specifications can be related to class level (OCL-like) specifications through notions of abstraction and refinement. Note that the OUN notation will not be used in the examples discussed in the sequel. The intention of the brief summary of OUN presented above is to provide an overview over the ADAPT-FT project, which greatly influences this work, by revisiting the integrated platform and the notation it involves. More details can be found in the OUN specific papers listed at the ADAPT-FT project web site, including [9, 21, 20]. 201 2.1 Notations and Formalisms 2.1.4 PVS as Underlying Semantic Domain The Prototype Verification System (PVS) [30] is an environment for constructing precise specifications and for developing proofs that can be mathematically verified. PVS is based on a strongly typed higher-order logic with powerful verification and validation mechanisms. A salient feature of PVS is its capacity to provide a highly expressive and strongly typed specification language (PVS-SL) [30] tightly integrated with a typechecker, and an interactive general-purpose theorem-prover. The PVS type system has been augmented by predicate subtyping and dependent typing mechanisms. Subtyping makes type checking more powerful by allowing stronger checks for consistency and invariance in a uniform manner. Subtyping renders, however, type checking undecidable and proof obligations may be generated during typechecking. A great deal of proof obligations can be discharged automatically using the PVS theorem-prover, whereas more involved ones require interaction from the user. The PVS environment provides semi-automatic tools with significant automation including decision procedures for several common theories such as equality and linear arithmetic [30]. A particular strength of PVS is its capacity to exploit the synergy between its tools. For instance, the theorem proving can be used in type checking, and information obtained from type checking and model checking can be used in theorem proving. As the main goal of the ADAPT-FT project was to adapt, tune, redevelop, UML OUN PVS JAVA Figure 2: Translations in the ADAPT-FT Platform and extend, formal methods towards the special needs of open distributed systems, an underlying semantical foundation was needed, preferably a foundation already implemented with a series of powerful tools. PVS [30, 31] was a natural choice in this respect, especially due to its strong type systems and functional sub-language, covering inductive data types and inductively defined functions, and its reasoning capabilities and tools, including some model checking facilities. PVS provides a vehicle for defining the semantics of the OUN language, in a precise manner, and for defining the associated specification formalism, including concepts for refinement and composition, and at the same time allowing development and reuse of 202 2.2 Semantics of UML Notations in PVS the semantical definitions in the design of tools, such as forms of reasoning tools. Even though the nature of PVS may be mathematically challenging to software engineers, a semantical basis is needed, from which engineering tools that are less esoteric may be developed. For instance, in the ADAPT-FT platform, integrating UML, OUN, Java and PVS, and by translating UML to OUN, Java and PVS, and OUN to java and PVS (see the arrows in Figure 2), one may develop tools at the level of UML diagrams or OUN programs, where the implementation of the tool is done at the PVS level (by means of PVS translations). Tools giving yes/no answers require no insight in PVS, and may provide useful feedback to the engineer. It would of course be desirable to have tools giving UML or OUN related feedback, built from PVS related tools; however, this is beyond the scope of the ADPAT-FT project. 2.2 Semantics of UML Notations in PVS Rigorous analysis of UML models of large applications involves manipulation of huge software artifacts, in which case tool support is crucial. This in turn calls for formal semantic definitions for the graphical UML notations. Consequently, a formal semantics facilitates verification, validation and simulation of models and improves the quality of models and software design. In our case, formal semantic definitions for the UML notations are proposed by representing them in a well-founded formalism, namely the PVS specification language (PVS-SL). A semantic definition for a UML sequence diagram captures properties that a system is expected to exhibit, i.e. system interaction described by the sequence diagram. Assumptions and invariants on the system are expressed in the PVS specification language as axioms and conjectures respectively. A trace of events specifies a possible run of the application specified by the sequence diagram if and only if the trace satisfies the requirements stated as predicates, provided that the assumption are fulfilled. For instance, for a trace that specifies a possible scenario of the interaction specified by the sequence diagram, and a given object participating in the interaction, the projection of the trace onto the set of events on the object must satisfy the requirements on the traces of the object. The requirements are stated as predicates on the set of traces of events. Static semantic constraints on modeling elements given as a set of well-formedness rules expressed in the Object Constraint Language (OCL) [44] can be specified similarly. The formalization approach adopted for UML statecharts consists of definition of a set of elementary predicates describing properties of system states or operations. The set of elementary predicates is then partitioned into elementary states and events. A state describes a condition of the system that has a non-zero duration. We make a clear distinction between concrete states of the system and the abstract notion of states in UML statecharts. We define three categories of predicates associated with the notions of state vertex, guard condition, and action respectively. The predicate associated 203 2.3 Tool Support with a state corresponds to a condition that must hold for the state to be activated. Predicates associated with an action corresponds to a condition that holds after the execution of the action; that can be understood as action’s postcondition. Whereas the state and guard conditions are boolean functions of values of the state variables before the execution of an operation starts, the postcondition is a boolean function of values of the state variables both before and after the execution of the operation. A transition is enabled if the event instance generated matches its trigger, its guard condition is true and its source state is active. An enabled transition may be eligible for firing. Firing a transition will activate its target state and execute its action. 2.3 Tool Support A tool support is a crucial component for successful application of a development framework in industrial settings. A CASE tool enables developers to manage large-scale projects, which usually involve manipulation of large software artifacts, and reduces development time by enabling them to discover subtle errors automatically. Experiences show that even the most carefully crafted formal specification and proof, can still contain inconsistencies, omissions and other errors [14]. To address this issue, we have developed a research platform, called the PrUDE (Precise UML Development Environment) tool [5]. The PrUDE integrates the UML [28] modeling notations and the PVS [30] formalisms, and their respective tools. Most of the commercial UML tools support only syntactic checks and code generation. Semantic checks are crucial in the development of critical systems, and hence it is necessary to integrate UML tools with a verification environment. In this regard, we use the PVS specification and verification environment and its toolkit in developing of our CASE tool, namely the PrUDE tool, to support not only formal verification but also testing and structured reviews. The PrUDE tool supports automated generation of formal specifications from UML models in PVS based on the UML semantics proposed in [1, 3, 4, 38]. UML models along with business rules are translated into PVS so that the theorem proving technique is exploited in checking their validity and consistency. The resulting specification will be an input to the PVS verification toolkit running at the back-end. The PrUDE tool suite supports checking well-formedness, consistency, model checking, proof checking and testing. The design models are created using a UML tool, whereas model analysis steps are performed using the PVS toolkit. The interface of the PrUDE tool to UML tools is based on the XMI [22] thus providing an explicit data exchange format. Since most of the existing UML tools support model exchange in the XMI format, the PrUDE platform is tool vendor independent, making it easily adaptable to existing software development environments. A major strength of the PrUDE tool is that it allows developers to deal with graphical UML models they have created, with minimal interaction with the formal 204 2.3 Tool Support stuff generated from the models and processed at the back-end. The latter is achieved by identifying and implementing proof strategies that provide automated solutions for verification of system properties based on the formal semantic definitions. Test cases are generated from UML models that are valid, i.e. well-formed and model checked successfully. The PrUDE tool provides an automatic test case generator and a test execution component. 2.3.1 V&V Strategy in the PrUDE Platform The V&V strategy underlying the PrUDE platform is shown in Figure 3. The rectangular boxes denote major activities, whereas the eclipses denote the resulting artifacts. The main steps in formal V&V process using the PrUDE tool are summarized below. - Start by developing design model using any UML CASE tool that supports model exchange in the XMI format. The UML models in the sequel are developed using the ArgoUML v0.12 [17] tool. - Describe properties of the modeling elements more precisely by adding suitable assertions. The assertions can be specified either in standard mathematical notations or OCL expressions. - The XMI model exported from the UML model is imported into the PrUDE tool. - Invoke the PrUDE tool and import the XMI file generated from the UML model. That means, a project in the PrUDE tool consists of a UML model, possibly augmented with business rules expressed as OCL constraints [44]. By using the PrUDE tool we can check well-formedness of the UML models, generate semantic models in PVS specification language, and analyze the resulting semantic models. Translation of UML models into PVS results in specification templates that include generic assertions such as well-formedness rules defining static semantics of UML models, and serving as the basis for the verification process. To perform a meaningful analysis, we need to complete the specification by adding some domain-specific assertions using the PVS property editor. - Finally, we analyze the semantic models by invoking PVS tools within the PrUDE tool. Type-checking, model-checking, and proof-checking are among the major analysis steps. In PrUDE, the PVS theorem prover can be invoked either in a batch mode or in an interactive mode allowing users to guide the proof steps. If a verification step fails, a PVS log file consisting of messages indicating errors or omissions is output. We interpret the message and trace the discovered errors back to the UML model, fix the errors and iterate through the above steps. 205 2.3 Tool Support OCL business rules U M L Spec Semantic conversion OCL2PVS translation PVS model Error − Type−checking − Well−formedness−checking Validation/Verification − M odel−checking − Proof−checking Valid U M L model Code generation Test case generation Test cases Program − Test execution/ − Test coverage analysis Figure 3: V&V Strategy Underlying the PrUDE Platform If a verification process is successfully completed, i.e. a valid UML model is obtained, we proceed with the development process using the UML models. We may refine them to achieve an implementation of the system. The resulting program code can be tested using the PrUDE tool based on the UML specification. Test cases are generated from the valid UML model obtained after a series of V&V steps. The test cases are derived from various constraints related to the model, e.g. invariants, pre- and post-conditions. The current version of the PrUDE tool provides automatic test case generator and a test execution component for Java programs. 2.3.2 Known Limitations of the PrUDE Tool The PrUDE tool is a research prototype developed to automate some aspect of the formal development framework we proposed. The PrUDE tool has some known limitations mainly with respect to implementation-related issues. Firstly, the translation of system properties described in OCL expressions into PVS is done manually in the current version of PrUDE tool. Hence, developers are expected to be familiar with the OCL notation, and to be able to use it to express business rules. In the future, the PrUDE tool will be extended with a component that automatically translates and integrates OCL expressions into PVS specifications, which should be rather straightforward. Moreover, semantic definitions should be extended and more proof strategies should be developed for the verification of domain-specific properties. 206 3. Case Study: a Banking System Another shortcoming of the PrUDE tool is that feedback from the PVS theorem prover, in the case of a failed proof, is rendered as an error message embedded in a PVS message. By using the contextual vocabulary of the application domain in both the UML models and the PVS log messages, developers can trace the cause of an error message. But, the error message provides little support for automated tracing of the component in the UML model that contains the error. In the future, we will implement a parser that interprets the PVS error messages and translate them into a plain text understandable to the developers. 3 Case Study: a Banking System In this section, we illustrate practical usability of the integrated framework we proposed [41] and the PrUDE tool by presenting an example of a formal development of a critical system - an electronic banking system. A typical banking system consists of the following main components: - a set of account numbers - an account master file - a data structure for storing the current balance for each account; - a list of transactions performed on the accounts during a given period of time; - a set of journals for storing transactions that are received from teller stations but not yet entered into ledgers; - a set of ledgers for tracking the flow of funds on their way through the system; - a set of automatic teller machines (ATMs), usually known as cash machines; - audit trails for recording actions of employees - essential information for verification of security requirements such as non-repudiation; - a set of program modules for overnight batch-processing of transactions, i.e. for posting the transactions into appropriate ledgers, and for updating the account master file. - several categories of actors - customers, employees, system administrators, auditors, etc. Online processing includes a number of program modules for adding transactions to appropriate combinations of ledgers. For instance, if a customer has successfully deposited a certain amount of funds into an account, then a transaction is created and the same amount of funds is debited from the saving account ledger, and credited to 207 3.1 Summary of System Requirements the ledger recording the cash in the drawer. That means, a successfully completed deposit transaction involves modifications of both the drawer and the debit ledgers. This scenario is useful for monitoring the overall balance of the bank and activities of bank employees. 3.1 Summary of System Requirements Functional requirement specification is a description of services that the system is expected to provide, how the system should react to a particular set of events, and how the system should behave in particular situations. The banking system is expected to provide the following list of functionalities. Note that the system requirements are significantly simplified and details are left out. • The system must provide an authentication mechanism. • Customers should be able to deposit, withdraw, or transfer funds, and inquire balances on their accounts. • Customers should be provided with magnetic cards and PIN codes that will be used in the authentication process to use the ATM terminals. The ATM terminals should allow customers to choose a specific service, e.g. cash withdrawal, or balance enquiry by pressing an appropriate key on the terminal. • Customers should be able to change PIN codes. • Cancellation of a transaction should be allowed, if necessary, before its completion. A successfully completed transaction is kept in a journal until it is processed and posted to the appropriate ledgers and the account master file is updated. Non-functional requirements are constraints put on the system, e.g. security requirements, and response time requirements. For an electronic banking system, a strong security mechanism is crucial to prevent customers from cheating each other and the bank, to prevent bank employees from cheating the customers and the bank, and to provide sufficient information for reconstruction of transactions and evidence to trace illegal actions. Different security models can be implemented to achieve the security requirements. In the Clark-Wilson model [7], for instance, security critical data items are constrained so that they can only be accessed or modified by users with appropriate level of security clearances. Data items are tagged with values specifying the level of access right required to access them, whereas actors are tagged with different levels of security clearances resulting in an access control matrix. 208 3.2 UML Models for the Application Domain 3.2 3.2.1 UML Models for the Application Domain Functional and Structural Models Using the UML modeling techniques, major components and aspects of the banking system and its business rules can be captured from different viewpoints. System functionalities and expected behaviors can be viewed as interactions between the system and its environment - actors such as customers, bank employees, and system administrators. UML use case diagrams are description technique for specifying, at a high level of abstraction, what the system is supposed to do. Use cases are often used in the early stages of the design process to capture the intended system requirements. For instance, the use case diagram shown in Figure 4 describes major functionalities of the banking system. A possible realization of a use case can be modelled as an interaction and can be specified by a sequence diagram. Structural system properties Figure 4: A Use Case Diagram Modeling System Functionalities can be captured using class diagrams in terms of classifiers and relationships between them. This enables system developers to focus on design issues at a suitable level of abstraction by avoiding implementation details. The class diagram shown in Fig. 5, for example, models major components of the banking system: the classes Bank, Person, Account, BankCard, Transaction, Ledger, Journal, ATM, CardReader, CashDispenser, and ATMSession and relationships between them. The links connecting the classifiers model communication, containment, and dependency relationships. For example, the classes Account and Bank are connected by a composition relationship that specifies the fact that an instance of the class Bank contains one or more instances of the class Account, whereas an instance of the class Account is contained in exactly one bank. A class specifies the data structure of its instances in terms of attributes and their 209 3.2 UML Models for the Application Domain Figure 5: Class Diagram Describing Structure of the System behaviors in terms of operations manipulating the data structures. The class Account, for instance, specifies a data structure that stores account number, current balance on an account, and a PIN code, and operations for manipulating them. Remark 3.1 The UML diagrams presented in the sequel are generated by using the ArgoUML [17] CASE tool. The stick arrowhead (→) on an association end in Figure 5 specifies the direction of navigation. The default multiplicity on an association end is 1 and association ends without explicit multiplicity assume the default value. The structural model of the banking system is shown in Figure 5 and briefly summarized below. • An instance of Bank may contain one or more instances of the class Account, whereas an object of the class Account belongs to exactly one Bank. A bank may own zero or more cash machines, issue zero or more bank cards, have zero or more customers, etc. • A cash machine contains exactly one cash dispenser, one card reader, and at most one ATM session at a time. • A transaction is associated with exactly one account, whereas an account may contain several transactions that are temporally ordered based on their time of completion. 210 3.2 UML Models for the Application Domain • We assume that an account is owned by exactly one customer, whereas a customer may own several accounts. This can easily be relaxed to accommodate the case where an account is owned by a set of customers. • There are two associations between the Transaction and the Ledger classes. This is to capture the fact that every transaction is posted to a pair of ledgers; one recording credit to the bank and the other recording debit from the bank. This enables us to effectively record flow of funds and to monitor overall balance of the bank. 3.2.2 UML Sequence Diagrams UML sequence diagrams are used to specify dynamic behavior of a system in terms of interactions between system components. They are useful for every stakeholder as they enable customers to visualize the specifics of their business processing; analysts to visualize the flow of processing; developers to visualize the objects that need to be developed and operations on those objects. An interaction is a possible realization of a use case described in terms of temporally ordered list of messages exchanged between the objects involved in the interaction. Sequence diagrams exist in two variants, namely the generic and instance forms. The generic form of sequence diagram describes must-interactions, whereas the instance form describes may-interactions between objects. Damm et al [10] define a variant known as Live Sequence Charts (LSCs), the main addition being the ability to specify a temperature (hot or cold) to specify the must and may interactions respectively. A generic sequence diagram describes the interaction of classes, and documents all of the messages that can be exchanged between objects of the classes. An instance form of a sequence diagram describes a single possible scenario that may or may not occur. In the sequel, we consider the instance forms of UML sequence diagrams. In an implementation of a behavior specified by a sequence diagram, a message corresponds to a method call on an object involved in the interaction. In a statechart diagram a message maps to an event that triggers a state transition. For example, the withdraw Fund use case shown in Figure 4 can be realized by the set of possible traces of events that lead to a successful withdrawal of funds, or to an unsuccessful attempt that is interrupted, for example, due to lack of sufficient funds in the account, or a wrong PIN code. For this discussion, we can assume that the authentication is successful. The sequence diagram shown in Figure 6 describes a scenario that leads to a successful withdrawal of funds from an ATM terminal. The interaction begins when a customer inserts a card into the card reader, which extracts information such as account number, balance on the account, PIN code, etc. and opens a session that interacts with the customer. The session prompts the user to enter a PIN code, and the ATM validates the PIN code. If the PIN code is valid, a list of the available services 211 3.2 UML Models for the Application Domain Figure 6: Sequence Diagram for a Successful Withdraw Funds Use Case (deposit, withdraw, or transfer funds) is displayed. The customer selects a service, the Withdraw in this case, by pressing an appropriate key. The ATM session prompts the customer to enter the amount of funds to be withdrawn. When the customer enters the amount, availability of sufficient funds on the account, and sufficient cash in the dispenser are verified. If there is sufficient funds, the ATM deducts the amount from the balance of the account and updates the information on the card. The cash dispenser provides the cash and a receipt to the customer and the card reader ejects the magnetic card and closes the session. The ATM completes the transaction and sends it to the banking system. The system may keep the transaction in a journal for batch processing or add it to appropriate ledgers. The balance on the account should be updated only after the transaction is completed and cash is delivered to the customer. In cases where a transaction is interrupted, 212 3.2 UML Models for the Application Domain Figure 7: Statechart Diagram for the Account Class e.g. due to invalid PIN code, or insufficient funds in the account or in the cash dispenser, the system allows the customer, respectively, to reenter the PIN code a limited number of times, or to try a smaller amount of funds. If a transaction is interrupted, appropriate messages will be sent to the actors, e.g. a customer or an employee. The sequence diagram shown in Figure 6 does not specify whether or not an account is updated before cash is successfully delivered to the user. It does not specify whether a successful authentication, i.e. correct PIN code, and availability of sufficient funds both in the account and the cash dispenser, are prerequisite for the delivery of cash either. 3.2.3 UML Statechart Diagrams UML statecharts are used to model dynamic system properties as a complete life cycle of an individual object. This enables us to visualize interactions between the object and its environment. State machines are the basis for important security requirements specification [15]. To show that a given system property is fulfilled using a state machine, it suffices to identify some states satisfying that property and prove that all transitions preserve the property. In that case, if the initial state has this property, then by induction, the system property holds always. The essential features of a state machine are the notions of state and state transitions occurring at discrete points in time. A state is a representation of a behavior of an object, or the system as a whole, at a given point in time capturing exactly the aspects relevant to the problem. For example, an account can be either in the Debit state or the Credit state. The directed links connecting the states describe transitions between the states. The possible set of state transitions can be specified by a next state function, which defines, for every state, the set of next states depending on the present state and the triggering event. 213 3.2 UML Models for the Application Domain A transition is labelled by a string of a general form n:e[c]/sa, where n is a transition name, e is a trigger event, c is a guard condition, and sa is a sequence of actions. For instance, in the statechart diagram shown in Fig. 7, which models complete life cycle of the class Account, T1,T2,...,T7 denote transition names, withdraw and deposit are trigger events, and balance - a > 0 is a guard on the transition T2. Sequences of actions are not explicitly shown in the statecharts diagram. For transitions triggered by event deposit, i.e. transitions T3,T6,T7, the list of actions includes updating of the balance with balance:=balance + a, whereas the withdraw event triggers transitions T2,T4,T5, leads to updating of the balance with balance:=balance - a. In the sequence diagram shown in Fig. 6, the later corresponds to the receiving and processing of the updateWithdraw event by an account object. Assertions on states, guard conditions and actions in statechart diagrams are translated into PVS expressions and integrated into the semantic model using the PrUDE tool. A predicate on a state specifies a condition that must hold whenever the object to which the state machine is associated is in that state. For instance, properties of an account, when it is in the Credit and Debit states, can be captured by the following local predicates. State : TYPE+ acc: VAR Account Credit, Debit : VAR State pred(Debit) = balance(acc) < 0 pred(Credit) = balance(acc) ≥ 0 A guard condition on a transition is a predicate that specifies the condition that must hold for the transaction to fire. A guard condition can be viewed as a precondition for the operation associated with the event triggering the transition. Guard conditions on state transitions are translated into predicates in PVS specification language. For instance, the guard conditions on the transitions in Figure 7 can be translated into the following predicates in PVS, where the guards g2,g4,g5,g6,g7 correspond to the transitions T2,T4,T5,T6,T7. Guard : TYPE+ : [Account, nat → bool] amount : VAR nat g2, g4, g5, g6, g7 : VAR Guard g2(acc,amount) = (balance(acc) - amount ≥ 0) g4(acc,amount) = (creditLimit + amount ≤ balance(acc)) AND (balance(acc) - amount < 0) g5(acc,amount) = (creditLimit + amount ≤ balance(acc)) g6(acc,amount) = (balance(acc) + amount < 0) g7(acc,amount) = (balance(acc) + amount ≥ 0) The creditLimit is an attribute of the Account class, which specifies the maximum amount of funds a customer can withdraw in debt, i.e. a fixed value that shows how 214 3.2 UML Models for the Application Domain far the balance on the account can go below zero. The bank may change, through negotiation and agreement with the customer, the value of the creditLimit of an account. 3.2.4 Specification of Business Rules in OCL UML diagrams are not detailed enough to address all the relevant aspects of system specification. Among other things, we need to describe additional constraints on elements in UML models that specify conditions and properties to be maintained, e.g. data invariants, pre- and post-conditions on operations, and complex multiplicity invariants. In this subsection, we describe some examples of constraints on the UML models given in previous sections using OCL [44, 28] expressions. Rule 1: An instance of the class BankCard, and the Account with which it is associated must belong to the same bank. In reference to the class diagram shown in Figure 5, this property can be captured with the following invariant. context BankCard inv: self.bank = self.account.bank Rule 2: For every instance of the class BankCard, the card holder must be the same as the owner of the account with which the card is associated. context BankCard inv: self.holder = self.account.owner This rule can easily be modified to specify the case where an account is owned by several customers, e.g. a woman and her husband, by simply changing the type of the attribute owner to a set and the equality requirement to membership in a set. Rule 3: The sum of the amounts of all transactions kept in the ledgers must be zero. This is equivalent to requiring that processing of every transaction preserves the overall balance of the banking system. Symbolically, n X amount(l) = 0 (3.1) l=1 where l is a ledger and n denotes the number of ledgers in the bank. This is a more complicated and important invariant that enables the banking system to prevent malicious acts by monitoring activities of its employees. For instance, if an employee wants to credit a given amount of funds to his own account, then he has to debit the same amount from another account, rather than just modifying the account’s master file. This requirement can be expressed as an invariant in OCL. context Bank inv: self.ledgers → collect(trans.amount → sum) → sum = 0 where collect is a predefined OCL operation on the collection type to return a subcollection of elements satisfying the predicate given as parameter. The relationships between the collections ledgers, transactions, etc. are as shown in Figure 5. This 215 3.2 UML Models for the Application Domain invariant is translated to a conjecture in PVS specification (see Theorem 3.1) and checked directly using the PVS theorem prover. This invariant is supposed to hold after completion of each transaction in an online processing, or daily in a batch processing. It significantly improves the security mechanism of the banking system by allowing monitoring of its overall balance. We specify a number of ledgers for recording different types of transactions. To simplify our discussion, we assume that the bank contains only three ledgers, namely: - a drawer ledger for recording transactions affecting the amount of cash in the drawer; - a credit ledger for recording transactions that affect the credit of the bank; and - a debit ledger for recording transactions that affect the debit of the bank. Note that the sets of transactions recorded in the ledgers are not mutually disjoint. When a transaction is successfully completed, it is processed and added to a pair of relevant ledgers. For instance, a deposit transaction is added to the drawer ledger to reflect the increment of cash in the drawer, and at the same time to the debit ledger to reflect the increment in the debit from the bank, i.e. the amount the bank must owe its customers. Rule 4: The system must not allow withdrawal of an amount of funds that makes the balance on the account less than the pre-agreed creditLimit - a fixed amount of funds that the customer can withdraw in debt disregarding ongoing transactions. For customers without such an agreement, creditLimit is equal to zero. Moreover, if a withdrawal is successfully completed, the balance on the account must be updated. These requirements are specified as pre- and post-conditions on the withdraw operation as follows: context Account :: withdraw(amount : nat) : nat pre: self.balance − amount ≥ self.creditLimit post: self.balance = self.balance@pre − amount where balance@pre indicates the value of variable balance at the start of the execution of the operation. A pre-condition on an operation corresponds to a guard condition on a state transition that must be fulfilled for the transition to be fired. State transitions must preserve local invariants, but a state transition may be undesirable globally. That is, when a transition is fired, the effect of actions associated with the transition may lead to undesirable behavior. For instance, transferring funds to a wrong account number is possible as far as the pre- and post-conditions are fulfilled. That is, the pre- and postcondition are necessary but not sufficient to enforce such requirements. Rule 5: If a person is both a customer and an employee of a bank, then the person must not be allowed to modify his own account. This requirement is related to the separation 216 3.2 UML Models for the Application Domain of duties security design principle. To enforce this requirement, every employee must be identified uniquely, for instance by a combination of social security number and a password, and a set of accounts that the employee can update must be specified. This requirement is expressed in OCL as follows: contextPerson inv: self.updates → excludes(self.owns) where excludes is a predefined OCL operation, and the updates attribute contains the set of accounts an employee can modify (see section 3.4 for more discussion). Rule 6: After a successful withdrawal transaction, the effect of the withdrawal must be reflected on the account by updating its balance before the cash is dispensed. What if the cash dispenser fails to deliver the cash after the balance is updated? This is an instance of the transaction integrity problem that can be handled by a new transaction that reestablishes the correct balance. In general, transactions can be kept in a journal until they are processed and added to appropriate ledgers by batch processing modules during the night. In our example, however, we assume that a transaction is put into ledgers immediately after it is successfully completed. System properties described in OCL expressions are integrated into the PVS specifications generated from the UML models and verified using the PVS toolkit. Rule 7: For any account, at most one ATM session can be associated with the account at any given time. This requirement prevents concurrent withdrawals from the same account by requiring uniqueness of an ATM session. This can be implemented by updating the balance on the account before a new ATM session can be started. context ATMSession inv: self.allInstances → f orall(s1, s2|s1 <> s2 implies s1.account <> s2.account) where the allInstances and the → are predefined OCL operations on types and object collections respectively. Rule 8: The balance on an account is equal to the difference between the sum of deposited funds and the sum of withdrawn funds. This constraint can be specified as an invariant expressed in OCL, and translated into a conjecture in PVS and discharged. context account inv: self.balance = self.trans → select(transKind = deposit)) → collect(trans.amount) → sum - (self.trans → select(transKind = withdraw)) → collect(trans.amount) → sum where select and collect are OCL operations and trans is the list of transactions performed on the account object. The select operation returns a sub-list of trans for which the boolean expression is true. The collect operation derives a collection of objects of type different from the original collection. It returns a bag of natural 217 3.3 Formal Analysis Using the PrUDE Tool numbers, i.e. amounts associated with the transactions selected. The sum operation returns the total sum of the amounts in the set of transactions to which it is applied. 3.3 Formal Analysis Using the PrUDE Tool The main purpose of integrating semi-formal modeling techniques with formal methods (FMs) is to exploit the mathematical foundation underlying FMs in reasoning about correctness of the graphical models. This requires translation of graphical UML models, and OCL constraints to PVS specifications to make them amenable to rigorous analysis. The translation of UML models is based on the semantic definitions we proposed for UML notations [1, 3, 4, 38] and implemented in the PrUDE [5] tool to support automatic translation of UML models into formal specifications in PVS. The translation of OCL expressions into PVS is rather straightforward since OCL is based on first-order logic and PVS is based on higher-order logic. The formal system development process using the PrUDE platform consists of the following major steps. • Analysis and design of a system using UML modeling techniques. In this step, structural and behavioral properties of major system components, relationships between the components, and possible interactions between them are described using the UML modeling techniques and notations. Any UML CASE tool that supports model exchange in the XMI format can be used to automate this step. In the sequel, the ArgoUML [17] tool is used. • PVS specifications are obtained by translating UML models and rigorously analyzed using the verification mechanisms and tools provided by the PVS environment in order to prove that the specifications satisfy the requirements. If an error is discovered during this step, e.g. if a type-checking fails, then the above steps are repeated until an error-free, UML model is obtained. • When a valid, i.e. a well-formed, UML model is obtained the developer proceeds with the implementation and code generation in a language of interest. Most of the UML CASE tools support generation of skeletons of codes in programming languages such as Java, C++, etc. Specifications of generic properties of UML models, e.g. the well-formedness constraints, can be captured by the semantic definitions for UML notations and obtained from the translation of UML models into PVS. The resulting PVS specifications are analyzed using the PVS verification tools such as the type-checker, theorem-prover and model-checker. The PVS specification shown in appendix B is, for instance, automatically generated from the sequence diagram shown in Figure 6 using the PrUDE tool. 218 3.3 Formal Analysis Using the PrUDE Tool The following are examples of generic properties of UML models. These properties follow from well-formedness constraints put on UML models. • For every object involved in a given interaction that is specified by a sequence diagram, its class should be specified at least in one class diagram. • For a given class and a statechart diagram describing its life cycle, an operation that triggers a state transition must be in the set of methods of the class. As mentioned previously application-specific properties should be added directly into the PVS specification. For instance, the invariant stated as Theorem 3.1 specifies the requirement that the overall balance of the bank must be preserved by a processing of a transaction, i.e. the addition of the transaction into a pair of appropriate ledgers (see Rule 3 in Section 3.2.4). In other words, for every transaction and a bank, processing of the transaction, i.e. its addition to a pair of appropriate ledgers, should preserve the overall balance of the bank. To specify and verify this requirement, we start by declarations of transaction, ledger, bank, types. In fact these declarations are extracted from the PVS specification resulted from the translation of UML models. Note that the excerpt from the PVS specification contains the minimal information necessary for the following discussion. TransactionKind : TYPE+ = {deposit, withdraw, transfer} Transaction :TYPE+ = [# transId: int, transKind: TransactionKind, amount: nat #] Ledger : TYPE+ = [# kind : LedgerKind, trans : list[Transaction] #] Bank : TYPE+ = [# accounts: setof[Account], drawer : Ledger, credit : Ledger, debit : Ledger #] A bank consists of a set of accounts, and three ledgers for recording cash in the drawer, the credit, and debit of the bank. A ledger consists of a list of transactions in the order of their occurrences. To every transaction there is an amount of funds. The recursive function sum ledger computes the sum of the amounts of funds associated with the list of transactions given as a parameter. When the PVS specification was typed, a TCC was generated in order to ensure termination of the recursion. The TCC was discharged automatically using the theorem-prover command (grind). sum_ledger(lt:list[Transaction]) : recursive nat = CASES lt OF null : 0, cons(t,lt1) : amount(t) + sum_ledger(lt1) 219 3.3 Formal Analysis Using the PrUDE Tool ENDCASES MEASURE length(lt) The predicate balanced?() defined on the Bank type states the condition that must hold when a bank is in the balanced state, i.e. the sum of all ledgers is equal to zero. b : VAR Bank balanced?(b): bool = sum_ledger(trans(drawer(b))) + sum_ledger(trans(credit(b))) + sum_ledger(trans(debit(b))) = 0 Processing of a transaction means addition of a successfully completed transaction into a pair of ledgers, depending on the kind of the transaction. More specifically, the transaction is appended to the sequence of transaction in the ledgers. It may be necessary to alter the amount associated with the transaction, for instance, when a withdrawal transaction is added to the drawer ledger. The auxiliary function neg() was defined for this purpose, whereas the function processTrans() specifies the processing of transactions. t : VAR Transaction neg(t) : Transaction = t WITH [amount:=-amount(t)] processTrans(t,b) : Bank = IF transKind(t)=withdraw THEN b WITH [drawer:=drawer(b) WITH [trans:=cons(neg(t),trans(drawer(b)))], credit:=credit(b) WITH [trans:=cons(t,trans(credit(b)))]] ELSE IF transKind(t) = deposit THEN b WITH [drawer:=drawer(b) WITH [trans:=cons(t,trans(drawer(b)))], debit:=debit(b) WITH [trans:=cons(neg(t),trans(debit(b)))]] ELSE b ENDIF ENDIF where WITH is a PVS construct for overriding values of fields of a record. Since the effect of processing a transfer transaction is the same as that of withdraw transaction, it is not considered in the definition of the processTrans() operation. The definition of the processTrans() operation is based on the assumption that a transaction is processed immediately after it is completed, otherwise the operation would have been recursive. Now let us specify the requirement as a theorem and prove it by invoking the PVS theorem-prover. Theorem 3.1 For any transaction t and a bank b, processing of the transaction preserves the overall balance of the bank. In other words, if the bank is in a balanced state, and a transaction is successfully processed, then the bank remains balanced. Symbolically, thm2: THEOREM FORALL t,b: balanced?(b) => balanced?(processTrans(t,b)) 220 3.3 Formal Analysis Using the PrUDE Tool The following is a slightly reformatted excerpt from a proof of the theorem generated by the PVS toolkit. thm2 : {1} FORALL t, b: (balanced?(b) => balanced?(processTrans(t,b))) Trying repeated skolemization, instantiation, and if-lifting, then Expanding the definition of sum ledger, and then Expanding the definition of processTrans, this simplifies to: thm2 : {-1} {1} (CASES trans(credit(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES) +(CASES trans(debit(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES) +(CASES trans(drawer(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES) = 0 (CASES (IF transKind(t!1) = withdraw THEN cons(t!1,trans(credit(b!1))) ELSE b!1‘credit‘trans ENDIF) OF null: 0, cons(t,lt1): amount(t) + sum ledger(lt1) ENDCASES) +(CASES (IF transKind(t!1)=withdraw THEN b!1‘debit‘trans ELSE cons(neg(t!1), trans(debit(b!1))) ENDIF) OF null: 0, cons(t,lt1): amount(t)+sum ledger(lt1) ENDCASES) + (CASES (IF transKind(t!1) = withdraw THEN cons(neg(t!1), trans(drawer(b!1))) ELSE cons(t!1, trans(drawer(b!1))) ENDIF) OF null: 0, cons(t,lt1): amount(t)+sum ledger(lt1) ENDCASES) = 0 Lifting IF-conditions to the top level, thm2 : 221 3.3 Formal Analysis Using the PrUDE Tool {-1} {1} IF null?(trans(credit(b!1)) THEN (0 + (CASES trans(debit(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES) + (CASES trans(drawer(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES)) = 0 ELSE amount(car(trans(credit(b!1)))) + sum ledger(cdr(trans(credit(b!1)))) + (CASES trans(debit(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES) + (CASES trans(drawer(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES) = 0 ENDIF IF transKind(t!1) = withdraw THEN (CASES cons(t!1,trans(credit(b!1))) OF null: 0, cons(t,lt1): amount(t) + sum ledger(lt1) ENDCASES) + (CASES b!1‘debit‘trans OF null: 0, cons(t,lt1): amount(t) + sum ledger(lt1) ENDCASES) + (CASES cons(neg(t!1), trans(drawer(b!1))) OF null: 0, cons(t,lt1): amount(t) + sum ledger(lt1) ENDCASES) = 0 ELSE (CASES b!1‘credit‘trans OF null: 0, cons(t,lt1): amount(t) + sum ledger(lt1) ENDCASES) + (CASES cons(neg(t!1), trans(debit(b!1))) OF null: 0, cons(t,lt1): amount(t) + sum ledger(lt1) ENDCASES) + (CASES cons(t!1, trans(drawer(b!1))) null: 0, cons(t,lt1): amount(t) + sum ledger(lt1) ENDCASES) = 0 ENDIF 222 3.4 Model-based V&V in Making Design Decisions Trying repeated skolemization, instantiation, and if-lifting, This completes the proof of thm2. Q.E.D. 3.4 Model-based V&V in Making Design Decisions In the UML standard document [28] it is stated that associations on base classes are inherited by its subclasses. We briefly discuss this issue, present a concrete example of a deviation of designers’ understanding of the issue, and illustrate how the proposed development framework may assist developers in making design decisions in cases when the semantics of the UML notations is ambiguous and/or inconsistent with intuitive informal semantics. In UML, the semantics of specialization/generalization relationship between classifiers satisfies Liskov’s substitutability principle [25] stated as follows: If S is a subtype of type T , then objects of T in a program may be substituted with objects of type S without altering the desired properties of the program, e.g. its correctness. In other words, if p(x) is a property provable about an element x of type T , then p(y) should be true for an element y of type S. Let us consider the specialization/generalization hierarchy of the classes extracted from the class diagram shown in Figure 5, modified/refined and shown in Figures 8 and 9 so that they suit the discussion in this section. When applied to the inheritance hierarchy shown in Figure 8, Liskov’s substitutability principle states that objects of specialized classes, namely the Employee and Customer classes, are substitutable for objects of the base class Person. In other words, the associations between classes Person and Account are inherited by the subclasses Customer and Employee of the class Person. Thiat means, both subclasses are associated with the class Account by the two associations they inherit from the base class. In PVS semantic models, we specify the inheritance hierarchy by representing classes and subclasses as PVS types and subtypes, respectively. Subtyping satisfies Liskov’s substitutability principle. Person : TYPE+ Employee : TYPE+ FROM Person Customer : TYPE+ FROM Person p : VAR Person b : VAR BAnk acc : VAR Account Moreover, semantics of inheritance relationship requires that sets of objects of specialized classes are mutually disjoint in the sense that they cannot have a common subclass. This property does not automatically follow from the specification of subclasses as uninterpreted subtypes declared above. Hence, we need to explicitly specify this property as a constraint on the metamodel (see axiom disjoint ax in the corePackage 223 3.4 Model-based V&V in Making Design Decisions Figure 8: Associations in Inheritance Hierarchy theory in the appendix A). There are two associations between the classes Person and Account (see Fig. 8: the updates association that captures the relationship between an account and a bank employee; and the owns association that specifies a relationship between an account and a bank customer. Specialized classes inherit both the structure and behavior of the base class. Note that the two associations may not be mutually disjoint, i.e. a single person can be associated to an account both as a customer and an employee (at least at this point) in which case additional restriction may apply to the set of accounts such a person may update. More specifically, a person should not be allowed to modify his own account. According to the semantics of inheritance in UML notations, an association involving a base class is inherited by all its subclasses. This means, referring to Figure 8, that the subclasses Employee and Customer inherit the two associations owns and updates from the base class Person. A person is said to be associated with a bank as an employee if there exists an account in the bank, which the person may updates. A person is said to be associated with a bank as a customer if there exists an account in the bank, which the person owns. We specify the associations and their properties as follows. owns : [Person -> set[Account]] updates : [Person -> set[Account]] uses : [Bank -> set[Person]] worksfor : [Bank -> set[Person]] worksfor ax:AXIOM (FORALL p,b: worksfor(b)(p) IFF (EXISTS acc: accounts(b)(acc) AND updates(p)(acc))) uses ax: AXIOM(FORALL p,b: (EXISTS acc: uses(b)(p) IFF accounts(b)(acc) AND owns(p)(acc))) Based on the above axioms, let us specify and verify the property stated as business Rule 5 in section 3.2.4. 224 3.4 Model-based V&V in Making Design Decisions Theorem 3.2 If a person p is an employee and a customer of a bank b, then the person must not be allowed to update an account acc which (s)he owns. Symbolically, thm6: THEOREM (FORALL p,b,acc: (worksfor(b)(p) AND uses(b)(p)) IMPLIES NOT (owns(p)(acc) IFF updates(p)(acc))) An attempt to prove the above theorem by invoking the PVS theorem prover, turned out to be unsuccessful by resulting in two unprovable subgoals: thm6.1 expressed as unproved sequent with several antecedents and no consequents; and thm6.2 expressed as a sequent with consequent contradicting the consequent of the original goal. The counter examples are given as PVS debugging messages, which indicate that either the antecedents are inconsistent, or they are insufficient to prove the sequent. thm6 : |-------------{1} (FORALL p,b,acc: (worksfor(b)(p) AND uses(b)(p)) IMPLIES NOT (owns(p)(acc) IFF updates(p)(acc))) Rule? (grind :theories ("inheritance")) Trying repeated skolemization, instantiation, and if-lifting, this yields 2 subgoals: thm6.1 : {-1} GeneralizableElement_pred(p!1) {-2} Classifier_pred(p!1) {-3} Class_pred(p!1) {-4} Person_pred(p!1) {-5} owns(p!1)(acc!1) {-6} updates(p!1)(acc!1) |-------------Rule? (postpone) Postponing thm6.1. thm6.2 : {-1} GeneralizableElement_pred(p!1) {-2} Classifier_pred(p!1) {-3} Class_pred(p!1) {-4} Person_pred(p!1) |-------------{1} owns(p!1)(acc!1) {2} updates(p!1)(acc!1) Rule? quit 225 3.4 Model-based V&V in Making Design Decisions Run time = 1.45 secs. Real time = 50.58 secs. A closer investigation of the axioms reveals that the antecedents are insufficient to prove the sequent. That means, it is inconclusive from the specified axioms, whether or not a person who can update an account is different from the one who owns it. Hence, we need to analyze the UML class diagram since this contradicts the intended/required property of the system. A solution is to specify the two associations owns and updates between the specialized classes Customer and Employee, and the class Account, respectively. We capture the desired property by specifying an {xor} (exclusive or) – a predefined constraint in UML – on the two associations (see Figure 9). The {xor} constraint specifies that for any instance of the class Account, either it is associated with an instance of the class Customer by the association owns or with an instance of the class Employee by the association updates, but not both. The {xor} constraint is translated to the following axiom in the PVS specification. E m ployee 1..* upda tes Person * Account {x or} * uses Custom er 1..* Figure 9: Associations in Inheritance Hierarchy xor ax: AXIOM (FORALL acc: (owns(c)(acc) XOR updates(e)(acc))) By including axiom xor ax in the PVS specification (see appendix E), theorem thm6 was discharged automatically by invoking the PVS prover, with the single command (grind :theories (”inheritance”)). thm6 : |------{1} (FORALL p,b,acc: (worksfor(b)(p) AND uses(b)(p)) IMPLIES NOT (owns(p)(acc) IFF updates(p)(acc))) Trying repeated skolemization, instantiation, and if-lifting, This completes the proof of thm6. Q.E.D. 226 3.5 Discussions This example shows how formal V&V can reveal subtle errors (omissions, inconsistencies, etc) in UML models, which may not be discovered otherwise, and how log messages can help us to reconsider our design decisions. Although the detected error might seem trivial, it is an example of typical errors that can easily be overlooked during design phase, until its it is too late and costly to fix them. 3.5 Discussions Generic correctness requirements on UML models are specified and automatically verified by implementing the well-formedness rules (WFRs) defining the UML static semantics in the PrUDE tool. Application-specific requirements should, however, be specified during the development process and this requires certain amount of developers’ interaction with the PrUDE platform, thus full automation of the verification process is not realistic. System models are expressed in UML notations, whereas additional constraints on models are captured either by OCL or OUN expressions. The ADAPT-FT project integrates UML, OUN and PVS into a platform for the formal development of open distributed systems (ODS). In the PrUDE tool, however, OCL is used instead of OUN to enhance the UML notations. The UML models, and the constraints expressed in OUN or OCL are translated to PVS to take advantage of the PVS theorem proving facilities in verifying correctness of the UML models [1, 3, 4]. The PrUDE platform relies on UML for modeling, and on OCL for specifying constraints on the models, and on PVS [30] for consistency checking and verification of the specifications. It allows developers to interactively insert assertions directly using the PVS editor. This seems to be in contrary to the main purpose of integrating formal methods with graphical modeling techniques, namely, hiding the processing of formal software artifacts from practitioners. However, as stated in [6], complete automation of the translation of semi-informal models into formal specifications is unlikely, since the informal descriptions are inherently incomplete. Most of the generative translations results in only skeletons of formal specifications and require the specifiers to provide additional details to complete the semantic models. Hence, translation of UML models into PVS results in a skeleton of formal specification that is neither ’complete’ nor detailed enough to perform a meaningful verification of the properties of the system in question. The level of details of the formal specifications generated from the UML models directly depends on the information available in the UML models and the detail of semantic definitions implemented in the CASE tool automating the translation. The PrUDE tool is developed based on the formal semantic definitions we proposed for a subset of the UML notations. Even if semantics for the whole UML notations is defined and implemented in the platform, it is impossible to capture all application specific properties although some generic properties can be implemented in the platform and instantiated in applications. Hence, allowing users to add system properties is essential for performing a meaningful verification and makes the PrUDE platform more flexible. This feature seems to contradict with the very purpose of developing the integrated platform and the supporting tool. This issue can be addressed in one or 227 4. Conclusion and Future Work more of the following ways: - Formalize generic domain-specific properties and implement them; - Use more user friendly and intuitively understandable specification languages such as the tabular notation; and [32, 19] that have semantic definitions in PVS. - Define and implement suitable proof strategies that capture domain-specific properties. The separation of generic semantic theory and model-specific definitions allows the development of a meta-theory and proof strategies for UML models, which are useful to reduce users’ interaction with verification tools. Another issue that needs further consideration is communication of results of formal verifications using PVS tools to developers who may not have knowledge about the PVS environment. In the current version of the PrUDE tool, results from PVS verification tools are reported as plain texts. The main challenge is, to present the feedback from the PVS tool, e.g. an error message from type-checking or the theorem-proving, in such a way that it enables the developers to trace the cause of errors back to the UML models they have created and identifying the model elements containing the errors. Such a mechanism is very crucial for practical usability of the proposed development framework and its tool. A preliminary investigation shows that it is feasible to achieve this by recording a sufficient amount of information that is necessary to re-engineer the UML models from the PVS specifications. For instance, preserving the system vocabulary across the graphical models and formal specifications significantly contributes to the improvement of practitioners understanding of feedbacks from the verification step. Moreover, encoding model information in a notation that preserves the structure of UML models can improve understanding of the developers, and at the same time represent sufficient information about model elements. An alternative approach is to implement an ’intelligent’ parser that can interpret the log file generated by the PVS verification tools. Even though the error messages might indicate the cause of errors in the UML models, they are not sufficiently detailed. In the future we implement an ”intelligent” parser that will extract textual ”Englishonly” messages from the raw PVS log messages. 4 Conclusion and Future Work Our framework relies on PVS [30] as a formalism for verification of specifications. Basic modeling constructs and constraints on UML diagrams can be expressed formally in the PVS specification language in terms of functions and abstract data types [2]. Our approach to consistency checking was described in [40] where software specification is done in a development framework, which integrates UML and PVS toolkit. A combined use of the different UML viewpoints improves integrity and completeness 228 of system models, which in turn provides a firm foundation for a better design and implementation decisions. By integrating semi-formal modeling notations with formal methods (FMs), we have taken a step towards exploiting the mathematical foundation underlying the FMs for rigorous analysis. This requires translation of UML models into PVS specifications that are amenable to rigorous analysis. The translation is based on semantic definitions we proposed in [1, 3, 4, 38] and provides the necessary link for reasoning about the UML models. The PrUDE tool automates most of the translation of UML models developed by using UML tools supporting data exchange in the XMI format into PVS specifications. The PVS toolkit allows us to perform conformance checks of the semantic models as illustrated in section 3. It is not feasible to implement all application-specific properties in a CASE tool as such properties will not be available before the development process starts. Generic properties, however, can be implemented in CASE tools. Hence, allowing users to add domain-specific properties is essential to perform a meaningful verification possibly guided by users. Moreover, this feature makes the PrUDE tool flexible and useful to a wider group of users. The fact that system designers are allowed to specify system properties in PVS, seems to contradict with the very purpose of developing the integrated framework and the supporting tools: minimizing user’s interaction with verification tools. This issue can be addressed by using a user friendly specification language such as the tabular notation [32] and by identifying a number of proof strategies for application-specific properties, to minimize user’s interaction with the theorem-prover. Another issue that needs further consideration is how to communicate feedbacks from PVS toolkit to developers who may not be expert in the PVS environment. One possible approach is to implement an ’intelligent’ parser that interprets the output from the PVS verification tools, and enables the developer to navigate the model to identify source of errors. We presented an integrated development framework and a supporting tool and illustrated how it can be used in the development of critical applications. We strongly believe that integrating formal methods with a well-accepted visual modeling language like the UML into a development process improves system reliability and clarity of the meaning of the modeling elements. The main contribution of our work is precise representation of UML models by translating them into PVS specifications and performing rigorous analysis. The interpretation of the feedbacks from the PVS verification tools into UML model needs to be addressed. This transformation is crucial for communicating results of formal analysis to software practitioners that may not be familiar with the PVS environment. A significant limitation of our framework is that when a proof fails there is no real explanation of the cause in the context of the UML models. 229 Acknowledgements We would like to thank Dr. Issa Traoré for reviewing earlier versions of this report and for his invaluable comments. References [1] D. Aredo, I. Traoré, and K. Stølen. An Outline of PVS Semantics for UML Class Diagrams (extended abstract). In the Proc. of The 11th Nordic Workshop on Programming Theory NWPT’99, Uppsala, Sweden, October 6-8, 1999. [2] D. B Aredo. Formalization of UML class Diagrams in PVS (Extended Abstract). In the Proc. of Workshop on Rigorous Modeling and Analysis with the UML: Challenges and Limitations, at OOPSLA99., Denver, Colorado, USA, November 2, 1999. [3] D. B. Aredo. A Framework for Semantics of UML Sequence Diagrams in PVS. Journal of Universal Computer Science (JUCS), Know-Center in cooperation with Springer Pub. Co., Joanneum Research and the IICM, Graz University of Technology, 8(7):674–697, July 2002. [4] D. B. Aredo. Semantics of UML Statecharts in PVS. In the Proc. of 7th World Multiconference on Systemics, Cybernetics and Informatics (SCI2003), Orlando, Florida, USA, July 27-30, 2003. [5] M. Belaid and I. Traoré. The Precise UML Development Environment (PrUDE) Reference Guide. Technical Report ECE01-2, Department of Electrical and Computer Eng., University of Victoria, April 2001. [6] J.-M. Bruel. Integrating Formal and Informal Specification Techniques. Why? How? In Overview of Panel discussion on International Workshop on Industrial Strength Formal Techniques, Vancouver, Canada, October 22, 1998. panalists: B. Cheng and S. Easterbrook and R. B. France and B. Rumpe. [7] D. D. Clark and D. R. Wilson. Comparison of Commercial and Military Computer Security Policies. In Proc. of the 1987 IEEE Symposium on Security and Privacy, pages 184–195, Oakland, California, USA, April 27-29, 1987. [8] M. Clavel, F. Durán, S. Eker, P. Lincoln, N. Martı́-Oliet, J. Meseguer, and J. F. Quesada. Maude: Specification and Programming in Rewriting Logic. Theoretical Computer Science, 285(2):187– 243, August 2002. [9] O.-J. Dahl and O. Owe. Formal Methods and the RM-ODP. Research report No. 261, March 1998. Department of Informatics, University of Oslo, Norway. [10] W. Damm and D. Harel. LSC’s: Breathing Life into Message Sequence Charts. In Formal Methods for Open Distributed Systems (FMOODS’99), Florence, Italy, February 15-18, 1999. [11] S. Easterbrook, J. Callahan, and V. Wiels. V&V Through Inconsistency Tracking and Analysis. In the Proc. of International Workshop on Software Specification and Design, Ise-Shima, Japan, April 16-18 1998. [12] S. Flake and W. Mueller. Expressing Property Specification Patterns with OCL. In The 2003 International Conference on Software Engineering Research and Practice (SERP’03), pages 595– 601, Las Vegas, NV, USA, June 2003. CSREA Press, Las Vegas, NV, USA. [13] S. Flake and W. Mueller. Formal Semantics of Static and Temporal State-Oriented OCL Constraints. Journal on Software and System Modeling (SoSyM), 2(3):164–186, October 2003. [14] A. Gargantini and E. Riccobene. Encoding Abstract State Machines in PVS. In Y. Gurevich, P. W. Kutter, M. Odersky, and L. Thiele, editors, Proc. of Abstract State Machines, Workshop, ASM 2000, volume 1912 of Lecture Notes in Computer Science, pages 303–322, Monte Verità, Switzerland, March 19-24, 2000. Springer. [15] D. Gollmann. Computer Security. John Wiley & Sons Ltd., Baffins Lane, Chichester, West Sussex PO19 1UD, England, 1999. 230 [16] G. J. Holzmann. Design and Validation of Computer Protocols. Prentice-Hall, 1991. [17] CollabNet Inc. ArgoUML: A modelling tool for design using UML, 1999-2002. URL address, http://argouml.tigris.org/. [18] ISO. A Formal Description Technique Based on the Temporal Ordering of Observational Behavior, September 1988. ”ISO Standard 8807”. [19] R. Janicki, D. Parnas, and J. Zucker. Tabular representations in relational documents. Relational Methods in Computer Science, pages 184–196. Springer-Verlag, 1996. In [20] E. B. Johnsen and O. Owe. A Compositional Formalism for Object Viewpoints. In A. Rensink and B. Jacobs, editors, Formal Methods for Open Object-Based Distributed Systems (FMOODS), pages 45–60. Kluwer Academic Publisher, March 2002. [21] E. B. Johnsen and O. Owe. Object-oriented specification and open distributed systems. In Olaf Owe, Stein Krogdahl, and Tom Lyche, editors, From Object-Orientation to Formal Methods: Dedicated to the Memory of Ole-Johan Dahl, volume 2635 of Lecture Notes in Computer Science. Springer-Verlag, 2003. [22] F. Keienburg and A. Rausch. Using XML/XMI for Tool Supported Evolution of UML Models. In the Proc. of the 34th Annual Hawaii International Conference on System Sciences (HICSS-34), Maui, Hawaii, January 3-6 2001. IEEE Computer Society. [23] Anneke Kleppe and Jos Warmer. Extending OCL to include Actions. In Andy Evans, Stuart Kent, and Bran Selic, editors, UML 2000 - The Unified Modeling Language. Advancing the Standard. Third International Conference, York, UK, October 2000, Proceedings, volume 1939 of LNCS, pages 440–450. Springer, 2000. [24] M. Lawford, P. Froebel, and G. Moum. Practical Application of Functional and Relational Methods for the Specification and Verification of Safety Critical Software. In T. Rus, editor, the Proc. of Algebraic Methodology and Software Technology, 8th International Conference, AMAST 2000, Iowa City, Iowa, USA, May 2000, volume 1816 of Lecture Notes in Computer Science, pages 73–88. Springer, 2000. [25] B. Liskov and J. Wing. A Behavioral Notation of Subtyping. ACM Trans. on Programming Languages and Systems, 16(6):1811–1841, November 1994. [26] Klasse Objecten. Octopus: OCL Tool for Precise Uml Specifications. [27] Dresden University of Technology. Dresden ocl toolset. [28] OMG. OMG Unified Modeling Language Specification, version 1.3, June 1999. OMG standard. [29] O. Owe and I. Ryl. The Oslo University Notation: A Formalism for Open, Object-Oriented, Distributed Systems. Report No. 270, August 1999. Department of Informatics, University of Oslo, Norway. [30] S. Owre, J. Rushby, N. Shankar, and F.V. Henke. Formal Verification for Fault-tolerant Architectures: Prolegomena to the design of PVS. IEEE Transactions On Software Engineering, 21(2):107–125, February 1995. [31] S. Owre, N. Shankar, J. Rushby, and D. W. Stringer-Calvert. PVS System Guide, version 2.3. Computer Science Laboratory, SRI International, Melon Park, CA, September 1999. [32] D. L. Parnas. Tabular Representation of Relations. Technical Report 260, Department of Electrical and Computer Engineering, Telecommunications Research Institute of Ontario, Communications Research Laboratory, 1992. [33] M. Richters and M. Gogolla. On Formalizing the UML Object Constraint Language (OCL) . In Tok Wang Ling, Sudha Ram, and Mong Li Lee, editors, Proc. 17th Int. Conf. Conceptual Modeling (ER’98), volume 1507 of LNCS, pages 449–464. Springer, 1998. [34] J. Rumbaugh, I. Jacobson, and G. Booch. The Umified Modeling Language, Reference Manual. Addison Wesley Longman Inc., 1999. 231 [35] J. Rushby. Specification, proof checking, and model checking for protocols and distributed systems with PVS. In FORTE X/PSTV XVII ’97: Formal Description Techniques and Protocol Specification, Testing and Verification, November 1997. [36] I. Sommerville. Software Engineering. Addison-Wesley, 5th edition, 1996. [37] J. M. Spivey. The Z Notation: A Reference Manual. Prentice-Hall International, 2nd edition, 1992. [38] I. Traoré. An Outline of PVS Semantics for UML Statecharts. Jounal of Universal Computer Science, 6(11):1088–1108, 2000. [39] I. Traoré and D. B. Aredo. Enhancing Structured Review with Model-based Verification. IEEE Transaction on Software Engineering (to appear), April 2004. [40] I. Traoré, D. B. Aredo, and K. Stølen. Tracking Inconsistencies in an Integrated Platform. Research report No. 274, August 1999. Department of Informatics, University of Oslo, Norway. [41] I. Traoré, D. B. Aredo, and H. Ye. An Integrated Framework for Formal Development of Distributed Systems. Journal of Information and Software Technology, Elsevier Science, 46(5):281– 286, April 2004. [42] I. Traoré, A. Jeffroy, M. Romdhani, and A.E.K. Sahraoui. An Experience with a Multiformalism Specification of an Avionics System. In the Proc. INCOSE 98, Vancouver, Canada, July 25-31, 1998. [43] J. B. Warmer and et al. Response to the UML2.0 OCL RfP, ver. 1.6, OMG Document ad/200301-07, January 2003. [44] J. B. Warmer and A. G. Kleppe. The Object Constraint Language: Precise Modeling with UML. Addison Wesley Longman Inc., 1999. [45] J. Whittle. Formal Approach to Systems Analysis Using UML: An Overview. Journal of Database Management, 11(4):4–13, 2000. [46] J. M. Wing. A Specifier’s Introduction to Formal Methods. IEEE Computer, 23:8–24, September 1990. 232 A Representation of UML Core Package %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Representation of UML Core Package-(Backbone and Relationships) %% UML v1.3 standard pp. 2-14 and 2-15 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% corePackage : THEORY BEGIN %%%% TYPE DECLARATIONS %%%%%%%%%%% ModelElement: TYPE+ Feature, GeneralizableElement, Parameter: TYPE+ FROM ModelElement Classifier: TYPE+ FROM GeneralizableElement Class: TYPE+ FROM Classifier StructFeature, BehavoralFeature: TYPE+ FROM Feature Attribute: TYPE+ FROM StructFeature Operation: TYPE+ FROM BehavoralFeature name: [Feature -> string] %%%% TYPE DECLARATIONS Core Package - Relationships Relationship, AssociationEnd: TYPE+ FROM ModelElement Association, Aggregation: TYPE+ FROM Relationship Generalization: TYPE+ FROM Relationship source, target: [Relationship -> Classifier] acyclic_ax: AXIOM (FORALL (r: Relationship): source(r) /= target(r)) parameters: [BehavoralFeature -> finite_sequence[Parameter]] typeof: [StructFeature -> Classifier] precondition, postcondition: [Operation -> bool] connection: [Association -> finite_sequence[AssociationEnd]] 233 connection_ax: AXIOM (FORALL (assoc: Association): length(connection(assoc)) >= 2) class_attributes: [Class -> set[Attribute]] class_features: [Class -> set[Operation]] children: [Classifier -> set[Classifier]] parents: [Classifier -> set[Classifier]] %%%% TYPE DECLARATIONS: Common Behaviour - Instances and Links Object: TYPE+ FROM ModelElement null: ModelElement classifier: [Object -> Class] instance_ax: AXIOM (FORALL (o: Object): classifier(o) /= null) class_objects: [Classifier -> set[Object]] %%%% VARIABLE DECLARATIONS c, c1, c2: VAR Class f1, f2: VAR Operation isActive: [Class -> bool] isRoot?(c): bool = (parents(c) = emptyset) isLeaf?(c): bool = (children(c) = emptyset) isAbstract(c): bool = (class_objects(c) = emptyset) %% Sets of instances of subclasses are mutually disjoint disjoint_ax: AXIOM (FORALL c, c1, c2: (children(c)(c1) AND children(c)(c2)) IMPLIES empty?(intersection(class_objects(c1), class_objects(c2)))) unique_names_ax: AXIOM (FORALL c, f1, f2: class_features(c)(f1) AND class_features(c)(f2) IMPLIES (name(f1) = name(f2) IMPLIES f1 = f2)) no_mult_parent_ax: AXIOM (FORALL c: singleton?(parents(c)) OR empty?(parents(c))) END corePackage 234 B UML Sequence Diagrams in PVS The following PVS specification is automatically generated from the UML sequence diagram shown in Figure 6 by using the PrUDE tool. The transformation is based on semantic definitions of UML notations provided in the PVS specification language and implemented in the PrUDE tool. In the current version of the PrUDE tool, applicationspecific properties are added interactively using the PVS property editor. In the future, we implement several domain specific properties, and proof strategies. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Semantic definition for a partial UML sequence disgram, %% generated from ArgoUML model using the PrUDE tool %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% sequenceDiagram[T:TYPE+]: THEORY BEGIN s: VAR set[T]; t1,y: VAR T optional?(s):bool = empty?(s) OR singleton?(s) optional: TYPE+ = (optional?) Event : TYPE+ AccessEvent : TYPE+ FROM Event e,x : VAR Event Attribute, Operation, Object: TYPE+ Trace: TYPE+ = list[Event] readCard,openSession,enterPin,readPin,verifyPin,pinOk, enterChoice,readChoice,enterAmount,readAmount,checkBalance, balanceOK,provideCash,cashOk,collectCash,updateWithdraw, ejectCard,collectCard,closeSession,auth: Event Class:TYPE = [# classID: string, attributes:setof[Attribute], operations:setof[Operation] #] t1,t2, t: VAR Trace n: VAR nat ae: VAR AccessEvent prefix_upto(n,t): RECURSIVE Trace = CASES t OF 235 null: null, cons(e, t2) : IF n=0 THEN null ELSE cons(e,prefix_upto(n-1,t2)) ENDIF ENDCASES MEASURE length(t) rank(e,t): RECURSIVE nat = IF NOT member(e,t) THEN 0 ELSE CASES t OF null:0, cons(x,t2): IF x=e THEN 1 ELSE 1+rank(e,t2) ENDIF ENDCASES ENDIF MEASURE length(t) ax: AXIOM FORALL t,e: member(e,t) IMPLIES member(auth, prefix_upto(rank(e,t), t)) SeqDiag : TYPE = [# seqDiagramID : string, objects: setof[Object], traces: setof[Trace] #] tr: VAR Trace y: Event sq: VAR SeqDiag Message : TYPE = [# name : string, source : Object, target : Object #] pin_cash_OK(t) : bool = FORALL e : (e = updateWithdraw AND member(e,t)) IMPLIES (LET prefix = prefix_upto(rank(e,t),t) IN member(pinOk,prefix) AND member(cashOk,prefix)) b, a : VAR nat %% balance and amount, respectively cl : nat = 1000 %% a constant Credit Limit balance_OK(b,a) : bool = b-a >= 0 OR (b-a < 0 AND b-a >= -cl) thm1: THEOREM FORALL (e:Event, t:Trace): (e=collectCash OR e=updateWithdraw) IMPLIES ((member(t,traces(withdrawSq)) AND member(e,t)) IMPLIES subset({pinOk,balanceOk,cashOk}, prefix_upto(rank(e,t),t))) END sequenceDiagram 236 C Partial Specification of the Banking System %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % PVS specification for the Banking system %% generated from ArgoUML model using the PrUDE tool %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% bank: THEORY BEGIN IMPORTING sequenceDiagram %%%%%%% DECLARATIONS OF TYPES %%%%%%%% ValueType: TYPE+ ClassID : TYPE+ = string Event : TYPE+ Trace : TYPE = list[Event] TransactionKind: TYPE+ = {deposit, withdraw} LedgerKind : TYPE+ = {drawerLedger, creditLedger, debitLedger} %%%%%%%%% DECLARATIONS OF CLASSES as TYPES %%%%%%% Transaction: TYPE+ = [# transId: int, transKind: TransactionKind, amount: int #] Account: TYPE+ = [# accountNum : string, balance : nat, pin : int, trans: list[Transaction], trace : list[Event] #] Ledger: TYPE+ = [# kind : LedgerKind, trans : list[Transaction], amount : int #] Bank: TYPE+ = [# accounts: setof[Account], drawer : Ledger, credit : Ledger, debit : Ledger #] %%%%%%% DECLARATIONS OF VARIABLES %%%%%%% acc, acc1: VAR Account tr : VAR Trace t, t2: VAR Transaction 237 b, b1, b2: l, l1, l2: lt : VAR Bank VAR Ledger VAR list[Transaction] %%%%%% CONSTRUCTIVE DEFINITIONS OF OPERATIONS %%%%% acc_bank_ax: AXIOM (FORALL acc,b1,b2: accounts(b1)(acc) AND accounts(b2)(acc) IMPLIES b1=b2) trans_ledger_ax: AXIOM (FORALL l1,l2: member(t,trans(l1)) AND member(t,trans(l2)) IMPLIES l1=l2) neg(t): Transaction = t WITH [amount:= -amount(t)] sum_ledger(lt): recursive int = CASES lt OF null: 0, cons(t,lt1): amount(t)+sum_ledger(lt1) ENDCASES MEASURE length(lt) balanced?(b): bool = sum_ledger(trans(drawer(b))) + sum_ledger(trans(credit(b))) + sum_ledger(trans(debit(b)))= 0 processTrans(t,b): Bank = IF transKind(t) = withdraw THEN b WITH [drawer:=drawer(b) WITH [trans:=cons(neg(t),trans(drawer(b)))], credit:=credit(b) WITH [trans:=cons(t,trans(credit(b)))]] ELSE IF transKind(t)=deposit THEN b WITH [drawer:=drawer(b) WITH [trans:=cons(t,trans(drawer(b)))], debit:=debit(b) WITH [trans:=cons(neg(t),trans(debit(b)))]] ELSE b ENDIF ENDIF thm1: THEOREM (FORALL t,l: (member(t,trans(l)) AND (transKind(t)=deposit OR transKind(t)=withdraw)) IMPLIES (EXISTS t2, l2: member(t2,trans(l2)) AND (t2=t WITH [amount:= -amount(t)]))) thm2: THEOREM (FORALL t,b: balanced?(b)=> balanced?(processTrans(t,b))) END bank 238 D Proof of Theorem thm2 thm2 : |--------------------------------------------------{1} FORALL (t, b): balanced?(b) => balanced?(processTrans(t, b)) Trying repeated skolemization, instantiation, and if-lifting, then Expanding the definition of sum ledger, and then Expanding the definition of processTrans(), this simplifies to: thm2 : {-1} CASES trans(credit(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES + CASES trans(debit(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES + CASES trans(drawer(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES = 0 |--------------------------------------------------{1} CASES IF transKind(t!1) = withdraw THEN cons(t!1, trans(credit(b!1))) ELSE b!1‘credit‘trans ENDIF OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES + CASES IF transKind(t!1) = withdraw THEN b!1‘debit‘trans ELSE cons(neg(t!1), trans(debit(b!1))) ENDIF OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES + CASES IF transKind(t!1) = withdraw THEN cons(neg(t!1), trans(drawer(b!1))) ELSE cons(t!1, trans(drawer(b!1))) ENDIF OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES = 0 Lifting IF-conditions to the top level, thm2 : 239 {-1} IF null?(trans(credit(b!1)) THEN (0 + (CASES trans(debit(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES) + (CASES trans(drawer(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES)) = 0 ELSE amount(car(trans(credit(b!1)))) + sum ledger(cdr(trans(credit(b!1)))) + CASES trans(debit(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES + CASES trans(drawer(b!1)) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES = 0 ENDIF |--------------------------------------------------{1} IF transKind(t!1) = withdraw THEN CASES cons(t!1, trans(credit(b!1))) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES + CASES b!1‘debit‘trans OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES + CASES cons(neg(t!1), trans(drawer(b!1))) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES = 0 ELSE CASES b!1‘credit‘trans OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES + CASES cons(neg(t!1), trans(debit(b!1))) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES + CASES cons(t!1, trans(drawer(b!1))) OF null: 0, cons(t, lt1): amount(t) + sum ledger(lt1) ENDCASES = 0 240 ENDIF Trying repeated skolemization, instantiation, and if-lifting, This completes the proof of thm2. Q.E.D. 241 E Association and Inheritance in UML inheritance : THEORY BEGIN %% IMPORTING IMPORTING bank IMPORTING corepackage %% TYPE DECLARATIONS - Inheritance Inheritance : TYPE+ FROM Relationship c1, c2 : VAR Class i: VAR Inheritance inh_ax: AXIOM (source(i)= c1 AND target(i)= c2 IFF children(c2)(c1) AND parents(c1)(c2)) %%% DECLARATION CLASS Person AND ITS SUBCLASSES Person: TYPE+ FROM Class Customer : TYPE+ FROM Person Employee : TYPE+ FROM Person %%%%% SOME VARIABLE DECLARATIONS %%%%%%%% b : VAR Bank acc, acc1, acc2 : VAR Account p, p1, p2 : VAR Person c : VAR Customer e: VAR Employee %%%%%% DECLARATION OF ASSOCIATIONS %%%%%%%%%%%%% owns : [Person -> set[Account]] updates : [Person -> set[Account]] uses : [Bank -> set[Person]] worksfor : [Bank -> set[Person]] %%%%%% AXIOMS %%%%%%%%%%% uses_ax: AXIOM (FORALL p,b: uses(b)(p) IFF (EXISTS acc: accounts(b)(acc) AND (owns(p)(acc) IMPLIES NOT updates(p)(acc)))) worksfor_ax: AXIOM (FORALL p,b: worksfor(b)(p) IFF (EXISTS acc: accounts(b)(acc) AND (updates(p)(acc) IMPLIES NOT owns(p)(acc)))) 242 %%% An employee is not allowed to update his owns account emp_cust_ax: AXIOM (FORALL e,b,acc: (uses(b)(e) AND worksfor(b)(e)) IMPLIES intersection(owns(e), updates(e)) = emptyset) %%% Declaration of {xor} constraint as an axiom xor_ax: AXIOM (FORALL p,acc: NOT (owns(p)(acc) IFF updates(p)(acc))) thm6: THEOREM (FORALL p,b,acc: (worksfor(b)(p) AND uses(b)(p)) IMPLIES NOT (owns(p)(acc) IFF updates(p)(acc))) END inheritance 243 F Proofs of Theorem thm6 thm6 : |-------------{1} (FORALL p,b,acc: (worksfor(b)(p) AND uses(b)(p)) IMPLIES NOT (owns(p)(acc) IFF updates(p)(acc))) Rule? (grind :theories ("inheritance")) Trying repeated skolemization, instantiation, and if-lifting, this yields 2 subgoals: thm6.1 : {-1} GeneralizableElement_pred(p!1) {-2} Classifier_pred(p!1) {-3} Class_pred(p!1) {-4} Person_pred(p!1) {-5} owns(p!1)(acc!1) {-6} updates(p!1)(acc!1) |-------------Rule? (postpone) Postponing thm6.1 thm6.2 : {-1} GeneralizableElement_pred(p!1) {-2} Classifier_pred(p!1) {-3} Class_pred(p!1) {-4} Person_pred(p!1) |-------------{1} owns(p!1)(acc!1) {2} updates(p!1)(acc!1) Rule? quit Run time = 1.45 secs. Real time = 50.58 secs. The two subgoals thm6.1 and thm6.2 generated are not provable. Hence, to prove the theorem we need to add an axiom (see section 3.4 for details). The following is a successful proof of theorem thm6. thm6 : |------{1} (FORALL p, b, acc: (workers(b)(p) AND workers(b)(p)) IMPLIES NOT (owns(p)(acc) IFF updates(p)(acc))) Trying repeated skolemization, instantiation, and if-lifting, this completes the proof of thm6. Q.E.D. 244

FORMAL DEVELOPMENT OF OPEN DISTRIBUTED SYSTEMS: INTEGRATION OF UML AND PVS Doctoral Dissertation

Related documents

Products

Support

FORMAL DEVELOPMENT OF OPEN DISTRIBUTED SYSTEMS: INTEGRATION OF UML AND PVS Doctoral Dissertation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib