LIGHTWEIGHT VALIDATION OF NATURAL LANGUAGE REQUIREMENTS Author: Sander van Meggelen, Bsc Group number: 4 Date: 26-3-2010 (Draft) 1 ABSTRACT In this paper, the partial validation of natural language requirement approach of Gervasi and Nuseibeh (2002) will be described. Next to a description of the use and goals of the method, a description of the eight main activities will be given. An example of the use of the described approach will be given. Thereafter the context of the approach will be sketched. This by giving more insight in the background of the authors. Also a scientific context will be provided by covering overlapping, preexistent and elaborated scientific research regarding the article of Gervasi and Nuseibeh (2002). INTRODUCTION DESCRIPTION OF USE Within the requirements engineering world, lightweight formal methods have been becoming increasingly popular (Jackson, 1996; Feather, 1998; Gervasi & Nuseibeh, 2002). Lightweight formal methods in the context of requirement engineering can be defined as methods that are relatively cheap compared to the requirement engineering as a whole (Gervasi & Nuseibeh, 2002). Lightweight formal methods as such are not meant for creating (informal) requirements documents into formal documents, they are rather used to gradually validate requirements, covering only specific parts of a requirements document. The lightweight formal methods will thereby clear the way for a complete and full covered analysis if needed (Gervasi & Nuseibeh, 2002). The goal of this approach is to have a usable low-cost, automated analysis of natural language requirements. MAIN ACTIVITIES The method defines a total of eight steps to perform the lightweight formal requirements validations. Below are the steps, as described in the article of Gervasi and Nuseibeh (2002), summarized. The first three steps are the set-up steps. The first step is to define a style, a structure and a language for the specific document which contains the requirements. The second step is to select the parts of the document which the method should check. As described, the lightweight method is usually done on specific parts of a document that are of interest. The third step is to look at which propertiesof the parts of the text, that have been selected in the second step, will be checked with which models. Step four until eight are the production steps, and can be iterated on at any stage. Noted should be that all those steps, with an exception of step eight are completely automatic. Step four is the pre-processing step where the format, structure and typographical details are translated to a canonical form so it can be used for further automatic processing. Step five consists of the parsing the natural language text into semantical content. Step six is the step where the models that were chosen in step three are built. Step seven checks if the models successfully check the properties of the content. And finally step eight, which is a manual step, consists of evaluating the findings and checking and correcting the requirement specification (Gervasi & Nuseibeh, 2002). 2 EXAMPLE OF METHOD For describing and elaborating the steps described in the main activities section, a (non-existent) natural language requirements document for describing the requirements of a passenger car is used. This requirements document can be used by a dealer in order to check whether the car he received by the manufacturer has met its requirements. In the first step, one defines the style, structure and language of the document. The document was written in a consistent style and structure, has a good structural quality and was written in English. In the second step is determined which properties to check. As the requirements document contains lots of technical details that cannot be checked by the dealer, only the properties that are externally observable where selected. In the third step, we select the models on which the properties will be checked; in this case we will use the Circe environment to provide automated analysis of the requirements. The environment allows the extraction of models from the requirements, their validation and the collection of metric data about the requirements document, the system described in it and about the requirements writing process itself (Gervasi & Nuseibeh, 2002). In step four the document will be pre-processed, in this step it is simply copy-pasting the original PDF document into the tool, which was not secured for copying. Step five consists of the parsing of the natural language text. The tool from step three required a domain-specific term document, this document was provided by the main organization of the dealer and covered all domain specific terms used in the requirements document. In step six, the building of the models was automatically done using the Circe environment. In step seven the properties that were selected in step two where checked. Each property was analyzed by the use of specialized validators (supported in the Circe environment) and consists of a few lines of codes, mainly focusing on consistency. In step eight, the result is manually checked by a requirement specialist, mainly on consistency errors. As this example features a non-existent requirements document, there is no access to the Circe environment, a concrete example from Gervasi & Nuseibeh (2002) is shown in figure 1, which gives an example of how a natural language text is automatically transformed to more formal requirements. 3 Figure 1: Example of how text from a natural language requirements document is transformed to a more formal requirement text (Gervasi & Nuseibeh, 2002). TEMPLATE REGARDING RULE INPUT Next to the natural language document that is inserted in the parsing programs, Vincenzo Gervasi and Bashar Nuseibeh use domain specific rules that are required as input for the tools used. (Gervasi & Nuseibeh, 2002). Table 1 suggests a input method for those domain specific rules. Object Subject Comparison CP FDIR state equals Car wheels equals Motorcycle wheels Less than Table 1: Example input method for domain specific rules Conditional TRUE 4 4 CREATORS The creators of the method are Vincenzo Gervasi and Bashar Nuseibeh. Vincenzo Gervasi is assistant professor at the University of Pisa and honorary associate at the Department of Software Engineering at the University of Technology, Sidney. His research interests lay within requirements engineering, software architecture and text and language (Gervasi, N.D.). Bashar Nuseibeh is Professor of Computing at The Open University and a Professor of Software Engineering and Chief Scientist at Lero, the Irish Software Engineering Research Centre. His research interest lay within the field of Software Specification and Design generally and Requirements Engineering in particular, Inconsistency Management and Security Requirements Engineering. In all of the above areas, he is interested in producing light-weight tools to automate and/or guide development activities (Naseibeh, 2009). 4 PROCESS-DELIVERY DIAGRAM In résumé of explaining the method, figure 2 shows a process-delivery diagram (PDD) of the method. The PDD shows the functions of the method on the left, and links this with the deliverables on the right. This diagram elaborates on Figure 1 of the article of Gervasi and Nuseibeh (2002). Table 1 and table 2 contain a separate description of the processes and concepts visible in figure 2. Figure 2: PDD of the method of the partial validation of natural language requirement approach of Gervasi and Nuseibeh (2002) 5 DESCRIPTION OF ACTIVITIES Activity Prepare validation Sub-activity Pre process document Define style Define structure Define language Parse-NL requirements Create models Description The activity where the format, structure and typographical details are translated to a canonical form so it can be used for further automatic processing (Gervasi & Nuseibeh, 2002). Defining the style of the document Define the document structure Defining in which natural language the document was written (English, Dutch etc) Parsing the natural language text into semantical content Define models Defining which models the properties are going to be checked on. In this case defining can just be naming the (existing) or elaborating on new models on which the properties will be checked. Build models Building of the models with regards to the parsed document, on the basis of the defined models in the previous step (Gervasi & Nuseibeh, 2002). Process Define properties Which properties of a certain document or system properties described in a document are ‘interesting’ depends on the particular context of the analysis. As is common with lightweight formal methods, partial validation is usually acceptable at this stage (Gervasi & Nuseibeh, 2002). Check properties Checks if the models successfully check the properties of the content (Gervasi & Nuseibeh, 2002). Evaluate findings Evaluating findings and revising the requirements specification accordingly. It is particularly important that the validation checks provide as much detail as possible about the point of and the reason for a failure (i.e. about the circumstances in which a validation property was violated). Like the counterexamples provided by other formal methods, this information helps the requirements engineer to identify and fix errors that cause violations (Gervasi & Nuseibeh, 2002). Table 1: Processes of Figure 2 and their descriptions DESCRIPTION OF CONCEPTS Concept PRE-PROCESSED TEXT PARSE TREES Description Pieces that are selected from the original provided base document. (Gervasi & Nuseibeh, 2002) Remaining fragments from the selected pieces of text, made ready for automated parsing, this is done and can only be automated parsed if the style, structure and language of the fragments are determined (Gervasi & Nuseibeh, 2002). 6 PARSED TEXT The text as it is translated to a canonical form and after that even system-semantic form so it can be used for further automatic processing. MODELS Selected models on the basis of which the selected properties will be checked. An example is the Circe automated environment model (Gervasi & Nuseibeh, 2002). PROPERTIES Properties are always relative to models, i.e. abstractions of the document or of the system described in it, which collect in an analyzable structure the information needed to check the property (Gervasi & Nuseibeh, 2002). RESULT DOCUMENT The finally parsed text which has been transformed by the against model checked properties will be the result document. This document will be checked and adjusted as needed manually by an requirement engineer (Gervasi & Nuseibeh, 2002). Table 2: Concepts of Figure 2 and their descriptions RELATED LITERATURE The approach by Gervasi & Nuseibeh (2002) which has a focus of lightweight validation in order to spot and identify problems in natural language requirements was preceded by research with a focus on just a formal specification of the requirements. Examples of this are Jackson and Damon (1996), and for example Reubenstein and Waters (1991) who propose that natural language requirements should be written with the use and syntax of the functional programming language LISP. Other preceding researches describe methods that use event/condition based tables to check to consistency (Heitmeyer, Labaw, & Bharadwaj, 1996). Other research suggested that formal, and natural requirements must be kept side-by-side, either supplementing the formal requirements by natural language as comments, without knowing both are consistent with each other (Duffy, MacNish, McDermid, & Morris, 1995), or by using the natural language to help the interaction with for example a customer or other non-technical people (Dalianis, 1992). Parsing techniques (partly) used by the approach of Gervasi & Nuseibeh (2002) are the NL-OOPS tool (Mich, 1996), the processing tool LOLITA (Morgan, et al., 1995) and the parsing techniques of Rolland (Rolland, 1992; Gervasi & Nuseibeh, 2002). Research which is based after writing the approach discussed in this paper was done by Nuseibeh based with his new publication describing an event-based requirement specification analysis (Russio, Miller, Nuseibeh, & Kramer, 2002). The other author, Gervasi proceeded by researching the direct implementation of the approach, in a journal article by Gervasi & Zowghi (2005), the problem of the use of the resulting direct logic of natural language requirements and expressing, discussing and clarifying them with stakeholders is discussed (Gervasi & Zowghi, 2005). REFERENCES Dalianis, H. (1992). A method for validating a conceptual model by natural language discourse generation. Lecture Notes in Computer Science, 593, 425-444. 7 Duffy, D., MacNish, C., McDermid, J., & Morris, P. (1995). A framework for requirements analysis using automated reasoning. (Springer-Verlag, Ed.) Lecture Notes in Computer Science, 932, 68-81. Feather, M. (1998). Rapid application of lightweight formal methods for consistency analyses. IEEE Transactions on Software Engineering, 24, 949-959. Gervasi, V. (N.D.). CV. Retrieved februari 17, 2009, from Vincenzo Gervasi homepage: http://circe.di.unipi.it/~gervasi/main/ Gervasi, V., & Nuseibeh, B. (2002). Lightweight validation of natural language requirements. Software - Practice and Experience, 32, 113-133. Gervasi, V., & Zowghi, D. (2005). Reasoning about inconsistencies in natural language requirements. Transactions on Software Engineering and Methodology, 14, 277-330. Heitmeyer, C., Labaw, B., & Bharadwaj, R. (1996). Automated consistency checking of requirements specifications. Transactions on Software Engineering and Methodology, 5, 231-261. Jackson, D. (1996). Lightweight formal methods. IEEE Computer, 29, 21-22. Mich, L. (1996). NL-OOPS: From natural language requirements to object oriented requirements using the natural language processing system LOLITA. Journal of Natural Language Engineering, 2, 161-187. Morgan, R., Garigliano, R., Callaghan, P., Poria, S., Smith, M., Uranowicz, A., et al. (1995). Description of the LOLITA system as used in MUC-6. Proceedings of the 6th conference on Message understanding (pp. 71-85). Maryland: Association for Computational Linguistics. Naseibeh, B. (2009, October 1). Research Interests. Retrieved februari 15, 2010, from Bashar Nuseibeh homepage: http://mcs.open.ac.uk/ban25/ Reubenstein, H. B., & C, W. R. (1991). The requirements apprentice: Automated assistance for requirements acquisition. Transactions on Software Engineering, 17, 226-240. Rolland, C. (1992). A natural language approach for requirements engineering. Advanced Information Systems, 593, 57-89. Russio, A., Miller, R., Nuseibeh, B., & Kramer, J. (2002). An abductive approach for analysing eventbased requirements specifications. Proceedings of 18th International Conference on Logic Programming (pp. 22-37). Copenhagen: Springer. 8