A – Draft Paper
By Inah Omoronyia
1
, Guttorm Sindre
1
, Tor Stålhane
1
, Stefan Biffl
2
, Thomas Moser
2
, and Wikan Sunindyo
2
1
Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway
2
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria
Draft Paper
Bernadette Puspa Budhawara
INFOME-2013
1 | P a g e
Introduction
Requirements elicitation is one main activity of requirements engineering. It has objective to understand the business process in the application domain , generate list of activities within the process and discovers restriction, regulation and boundaries within the process (Kaiya & Saeki, 2006). Requirements engineering defines as a discipline that elaborate customer’s needs through communication, documentation and implementation (Farfeleder et al., 2011). In this research, the researchers used ontology in eliciting the requirement in order to map and remove gaps between assumptions and decisions during the process
(Omoronyia et al., 2010). Ontologies are schemes for reading the meta-data. It controlled the vocabulary concept of the given texts, provides the match definition for each word and processes it semantically
(Maedche & Staab, 2001) and it based on the real world communication structures concept (Gruber, 1993).
The procedure continued by clustered the output from requirement elicitation and map it into one specific knowledge domain (Gruninger & Lee, 2002).
The research goal is proposed a new method as guidance to perform requirements elicitation based on ontology in a semi-automated way, which not yet covered by the preceding ontology based requirements elicitations methods proposed by Lee & Zhao (2006) and Shibaoka et al. (2007). The last two proposed methods need some manual ontology processing steps before some statements can be processed in requirements elicitation, which might increases chance of input inconsistencies for requirement elicitations.
For example if the requirements statement analyzed by two ontology analysts, both fluent in English but originally come from two different countries – for instance one from India while the others from
Macedonia. The ontology analytic results might be differs as their understanding toward the topics as each vocabularies and its semantic relationship translated differently. Thus, from industry perspective, the deliverable of this research might be useful in decreasing the chance of input inconsistency and the amount of tasks needed in processing the documents that need to be interpreted by domain experts/ ontology analyst.
The proposed method called Rule-based ontology for requirements elicitation . It provides some possibilities for capturing the initial or baseline concepts-relations and map it to a specific domain ontology based on some existing texts. Their rule-based ontology for requirements elicitation method which consists of the following steps: (1) Prepare the text documents manually by removing symbols and/ or edit the text format; (2) Trail brackets; automatically map brackets or dashes and change it into punctuation marks and generate supplementary texts. For example in the given sentence: ‘mobile-phone makers (Nokia, Samsung,
Apple, Blackberry and many others)’. Bracket trailing will extract the “Nokia”, “Samsung”, “Apple”,
“Blackberry” and “many others” and related with the concept “makers”; (3) Complete bridged-term; locate and correcting bridge-term in the text. Given the phrase “inhale and exhale respiration methods”, with bridged-term completion it should suggest the combine terms “inhale respiration methods” and “exhale respiration methods” as concepts than just “inhale”; (4) Extract SPO (subject-predicate-object); automatically identify subjects, predicates and objects, continue with categorize them as concepts and relations between concepts; (5) Mine association; automatically identifies head and link it to the origin subjects/ objects from the previous step; and (6) Cluster the (finding) concepts-relations. The output will be used by requirement analyst or product manager to start the requirements analysis and negotiation phase.
The authors were six academic researchers from Department of Computer and Information Science in
Norwegian University of Science and Technology and Institute of Software Technology and Interactive
Systems in Vienna University of Technology. All of them have a software engineering academic background and specialize in semantic programming and interactive systems.
2 | P a g e
Example
This section discusses the use case of the Rule-based ontology for requirements elicitation method in processing the following texts as part of requirement. The given requirement text is a part of project requirements for creating management report called Overview OU.
Requirement texts:
The Overview OU report is a management report that shows the availability hours from each resource. The report must shows conflict between direct and indirect hours from each resource. The conflicts should be visible. The manager should be able to access this report and exportable into a spreadsheet format file (excel and csv). This report updated every days and the manager will receive a notification if the report is not ready or corrupted.
The following are the steps the reviewer took to adapt the Rule-based ontology for requirements elicitation method in processing the requirement texts.
(1) Prepare the text documents manually by removing symbols and/ or edit the text format;
The text does not contain any symbols, continue with step 2.
(2) Trailing any brackets ;
Locate some texts within the brackets, which are “excel” and “csv” and link it to concept “file”.
No other brackets found.
(3) Complete bridged-term ;
Locate some texts that consider as bridge-term. Those are “..
direct and indirect hours
”. After the correction the concepts are “direct hours” and “indirect hours”
(4) Extract SPO (subject-predicate-object) into concept – relations.
The extraction starts with the first text. In this example it first extract “The” addressing title of the report, continue with “Overview”, “OU”, “report” and so on.
(5) Mine association .
Identified the associations between concepts with head – parent-child relationships, based on the given text the parent-child captured are “management report”, “report”, “availability hours”, hours”, “management”,
“manager”, “access”, and “format file”
(6) Finalize the finding concepts and relations into clusters.
Cluster the finding concepts and relation. Explicit concepts/ relations are collected directly from the requirements text, while assumed concepts/ relations are results of reasoned based to cluster concepts.
The results of extractions summarize into concepts and relations displayed in table 1 and figure 1 below
(light yellow marked is the parent-child):
Concept Relations
Assumed
8
Explicit Assumed Explicit Parent-Child
23 6
Table 1
8 9
Figure 1
3 | P a g e
Process Deliverable – Diagram (PDD)
The following section will discuss the representative of the Rule-based ontology for requirements elicitation method in Process Deliverable – Diagram (PDD). The method provides some possibilities for capturing the initial or baseline concepts-relations and map it to a specific domain ontology based on some existing texts. The detail PDD depicted in Figure 2.
The process consists of the following activities:
(1) Prepare the text documents manually by removing symbols and/ or edit the text format;
(2) Trail brackets; automatically map brackets or dashes and change it into punctuation marks and generate supplementary texts. For example in the given sentence: ‘the notification should be possible to be saved as document file types (excel, pdf or docx )’. Bracket trailing will extract the “excel”, “pdf”, and “docx” and related with the concept “file-types”;
(3) Complete bridged-term; locate and correcting bridge-term in the text. Given the phrase “inhale and exhale respiration methods”, with bridged-term completion it should suggest the combine terms “inhale respiration methods” and “exhale respiration methods” as concepts than just “inhale”;
(4) Extract SPO (subject-predicate-object); automatically identify subjects, predicates and objects, continue with categorize them as concepts and relations between concepts;
(5) Mine association; automatically identifies head and link it to the origin subjects/ objects from the previous step; and
(6) Cluster the (finding) concepts-relations. The output will be used by requirement analyst or product manager to start the requirements analysis and negotiation phase.
4 | P a g e
Figure 2
Rule-Based Baseline Ontology for Req. Elicitation
Preparation
Locate Symbol/ Special Characters
[Run again]
[identify]
[else]
Edit Symbol/ Special Characters
Resume Preparation
Requirement analyst
Trail-Brackets
[Run again]
Locate Brackets/ Dashes
[identify]
Remove Predefine Reference
Pointers
[else]
Extract Subjects/ Objects
Link Subjects/ Objects to Head
Sentences
Complete
Bridge-term
[Run again]
Resume Requirements Statements
Automated
Locate Bridge-term
[identify]
Analyze Bridge-term pattern
Automated
Correct Bridge-term
Automated
[else]
Tag Correction
Automated
Extract SPO
Control Text Construction
Semi - Automated
Locate Subject-Predicate-Object
(SPO)
[identify]
Control SPO
[existed]
[else]
Define Concept
Define Relation
[Run again]
Resume text after SPO extraction
Automated
Mine association
[Run again]
Locate Head
[identify]
Link head with Subject/Object
[else]
Extract prepositional phrase
(PP) Relation
Automated
Cluster
Concepts
Match lexical similarity
Generate Taxonomy Tree
Resume Clusters
Automated
Elicit Requirements
Requirements Analyst
REQUIREMENT DOCUMENT
1
0..*
SYMBOL/ SPECIAL CHARACTER LIST
EDITED SYMBOL/ SPECIAL CHARACTER LIST
0..*
PREPARED REQUIREMENT 1..*
PrepReqDoc_ID
ReqDoc_ID
Prep_Req_Status
Prep_Req_Doc
EXTRACTION
1
0..1
BRACKET/ DASH 0..*
Brack_Dash_ID
Extraction_ID
Type
Texts in Bracket-Dash
REMOVED PREDEFINED
REFERENCE POINTERS
1..*
SO – HEAD SENTENCE
SUBJECT/ OBJECT (SO)
1..*
1..*
1..*
REQUIREMENT AFTER TRAIL-BRACKETS
BRIDGE-TERM PATTERN
0..*
CORRECTION
0..*
TAG
BRIDGE-TERM
1 1
0..* has
1..*
1..*
RECONSTRUCTED REQUIREMENT
SUBJECT-PREDICATE-OBJECT
1
1..*
1..*
SPO LIST
CONCEPT
RELATION
SPO EXTRACTION
1..*
1..*
1..*
HEAD ASSOCIATION
1
1..*
HEAD
– SUBJECT/ OBJECT
1..*
PP RELATION
SEMANTIC CLUSTER
1
1..*
LEXICAL SIMILARITY
0..*
1..*
TAXONOMY TREE
1..*
CLUSTERED
0..*
ELICITED REQUIREMENT
1..*
5 | P a g e
The next 2 tables (Table 2 & 3) contain the list of activities and the deliverable tables to complete the PDD explanation.
Table 2. Activities – Sub Activities
Activity Sub Activitity Description
Preparation
Trail-Brackets
Complete Bridge-
Term
Locate Symbol/ Special Characters
Edit Symbol/ Special Characters
Resume Preparation
As the first sub-activity in preparation activity, the requirement analyst needs to locate any symbol or special characters within the requirement descriptions. This sub-activity will be repeated until all symbols/ special characters located.
If the requirement analyst locates any symbol/ special characters, it should be edited or if necessary removed by the requirement analyst.
The requirement analyst resumes the preparation activity by manually prepare the requirement descriptions after removing edited symbol or special characters.
Locate Brackets/ Dashes All the sub-activities of trail-brackets activity automatically processed. The first sub-activity is locating any brackets/ dashes within the requirement documents and set a reference pointer when discovered one. Requirement description normally uses pair of brackets or dashes as punctuation marks to separate supplementary text with the others (Omoronyia et al., 2010). This sub-activity will be repeated until all brackets/ dashes located.
Remove Predefine Reference Pointer
Extract Subjects/ Objects
The automated activity continues with removing the predefine reference pointers after the texts within the brackets/ dashes recorded.
The automated activity extracts the texts between brackets/ dashes and categorizes it into subjects or objects.
Link Subjects/ Object to Head Sentences The automated activity creates link to connect the subjects and objects to the head sentences
Resume Requirement Statements based on the noun phrase (NP) or the verb phrase (VP) of the sentences before the brackets/ dashes. Relationship semantically identified as
<refers to>.
The automated activity resumes requirement statements after the process completes the subactivities to locate all brackets/ dashes in the statement, remove the brackets/ dashes, subjects/
Locate Bridge-term objects extraction and link the subjects/ objects to the head sentences.
The process continues with completing the bridge-term activity. Sub-activity automatically started by locating bridge-term in the statements.
This sub-activity will be repeated until all bridge-term located.
Analyze Bridge-term pattern Automatically, when a bridge term located a program method based on NLP analysis uses to recognize the potential concepts to remove any ambiguity and to provide understanding of the
6 | P a g e
Extract SPO
Mine Association context of the phrase (Omoronyia et al., 2010).
Correct Bridge-term
Tag Correction
Control Text Construction
Locate Subject-Predicate-Object (SPO)
Control SPO
Define Concept
Automatically the identified bridge-term will be corrected based on the bridge-term pattern analysis results.
The corrected bridge-term texts will be tagged automatically to mark the changes.
A domain expert manually controls the text construction to locate any ambiguity and if necessary provide a better term to make the requirement statement more understandable.
Automated activity continues with Subject-
Predicate-Object (SPO) extraction. The first sub-activity is locating subject-predicate-object from the requirement descriptions after bridgeterm correction. SPO itself defines as a parsing process that locating the phrase tree of a sentence to depict declarative clauses between the associated subject – object and their predicate (Omoronyia et al., 2010). This subactivity will be repeated until all SPO located.
The located SPO controlled to differentiate the association between predicate with it subject or object.
Automatically – based on logic of ontology, the located subjects and objects define as concept.
The logic explained as follow each sentence consists of a noun phrase (NP) and a verb phrase
(VP). The NP contains subject that identified by a POS variant from noun (e.g. singular, plural), while the VP contains the predicate and object
(Omoronyia et al., 2010).
Define Relation
Resume text after SPO extraction
Locate Head
Link Head with Subject/ Object
The activity continues automatically by define the relation based on the predicates. These predicates define the semantic relations between concepts.
Automatically the text will be resumed after subject-predicate-object (SPO) extraction.
The activity automatically continues by locating and identifying the head as part of mine association between head and subject-object.
This sub-activity will be repeated until all head located and identified.
Automatically the head will be linked with the subject/ object from the previous subject/ object extraction activity.
Extract prepositional phrase (PP) relation Activity continues automatically by extracting the prepositional phrase (PP) relation. The logic normally consists of a preposition and an object of a preposition. System sets a potential subject/ object as an object in PP. The subject/ object contained in the first noun-phrase will be considered as the head of subject/ objects, while any other objects/ subjects in the statement
7 | P a g e
Cluster Concept Match Lexical similarity
Generate Taxonomy Tree
Resume Clusters
Elicit Requirements without PP are set as head object/ subjects directly. Preposition made based on semantic point a view, in order to provide temporal or spatial relationship illustration between the objects/ subjects of the prepositional phrase and object/ subjects of the previous noun-phrase.
(Omoronyia et al., 2010).
Activity automatically continues with cluster the finding concepts. Firstly, a technique based on
Vector Space Model uses to match the concept based on its lexical similarity is used to merge any different semantic graphs and remove any similarities in the finding concepts and relations.
Activity continues by generating a taxonomy tree which based on terms similarity that used to represent different concepts.
Automatically process will resume all the concepts and relations into clusters.
Based on the results Requirement Analyst will be able to elicit the requirements before start analyze, negotiate and validate the requirements
(Kotoya & Sommerville, 1998)
Table 3. Concepts
CONCEPT DESCRIPTIONS
REQUIREMENT DOCUMENT A document contains detail requirement statement. This apply to any type of domain knowledge. The example given in Omoronyia et al.
(2010) assess requirement statement for Automatic Cruise Control which relate to mechanical engineering - while the example I provided in the previous assignment is an IS requirement statement.
SYMBOL/ SPECIAL CHARACTER LIST A list of symbol and/ or special character discovered in the requirement statement (Omoronyia et al., 2010).
EDITED SYMBOL/ SPECIAL
CHARACTER LIST
PREPARED REQUIREMENT
A list of edited symbol and/ or special character discovered in the requirement statement (Omoronyia et al., 2010).
A soft-copy document contain requirement statement that no longer contain symbol/ special charachter (Omoronyia et al., 2010).
EXTRACTION
BRACKET/ DASH
A document contain texts extraction gathered during brackets-trailing activities (Omoronyia et al., 2010).
A list of text in the bracket or dash (Omoronyia et al., 2010).
SUBJECT/ OBJECTS (SO)
SO – HEAD SENTENCE
REQUIREMENT AFTER TRAIL-
BRACKETS
BRIDGE-TERM
BRIDGE-TERM PATTERN
CORRECTION
TAG
A subject and object identified in the requirement (Omoronyia et al.,
2010).
A list of relation between subject-object (SO) with the head concept of the text in the requirements statement (Omoronyia et al., 2010).
A resumed requirement statement after all the brackets trailed
(Omoronyia et al., 2010).
A list of bridge-term in the requirement statement (Omoronyia et al.,
2010).
An identified pattern of the bridge-term in the requirement statement
(Omoronyia et al., 2010).
A list of correction made in the text after bridge-term analyze and identify (Omoronyia et al., 2010).
A list of tag link to the correction that made for each bridge-term
(Omoronyia et al., 2010).
8 | P a g e
RECONSTRUCTED REQUIREMENT
SUBJECT-PREDICATE-OBJECT
SPO LIST
CONCEPT
RELATION
SPO EXTRACTION
HEAD ASSOCIATION
HEAD – SUBJECT/ OBJECT
PP RELATION
SEMANTIC CLUSTER
LEXICAL SIMILARITY
TAXONOMY TREE
CLUSTERED
ELICITED REQUIREMENT
A document contain a reconstructed requirement statement after bridgeterm removed and already controlled by requirement analyst to avoid any ambiguity terms (Omoronyia et al., 2010).
A document contain subject-predicate-object (SPO) of the requirement based on results of extraction of SPO, concept and relation (Omoronyia et al., 2010).
A list of SPO (subject-predicate-object) identified in the text
(Omoronyia et al., 2010).
A list of define concept based on the SPO extracted from the requirement statement (Omoronyia et al., 2010).
A list of define relation based on the SPO extracted from the requirement statement (Omoronyia et al., 2010).
A document resumed as the complete subject-predicate-object extraction (Omoronyia et al., 2010).
A list of concept link with the head of subject-object identified in association mining activity. It derives from the generated subject-object during the extraction (Omoronyia et al., 2010).
A list of link between subject and head (as domain) and it's relation that derived from the extracted predicates (Omoronyia et al., 2010).
A list of prepositional phrase (PP) relation. PP is consist of a preposition and an object of preposition. The potential subject-object define as object in PP. PP used to define temporal or spatial relationship between subject-object. Based on NLP method any subject-object sentences without PP by default are head-subject/ object (Omoronyia et al., 2010 & Liddy, 2001).
A list of concepts and relations clusters, extracted from the preceding activities (SPO analysis and association mining) (Omoronyia et al.,
2010).
A list of lexical similarity in the concept-relation. It is necessary to match lexical similarity in order to remove any repetitive concept and relation (Omoronyia et al., 2010).
A list of taxonomy tree. It based on similarity between terms used to describe different concept (Omoronyia et al., 2010).
A list of the resumed clustered concept after the completion of subactivities matching lexical similarity and generated taxonomy tree
(Omoronyia et al., 2010).
A document as deliverable of the requirement elicitation with rulebased baseline ontology method. It is a part of the requirement document (Omoronyia et al., 2010 & Kotoya & Sommerville, 1998).
Related Literatures
The following sections discuss the reference techniques and researches used as basic in developing the
Rule-based ontology for requirements elicitation method.
Natural Language Processing (NLP) techniques use as research basic concept in extracting relations and concepts from requirements statement to specific domain ontologies as its adapts the human processing the language, which gives the correct support reasoning in processing the given tasks (Liddy, 2001). NLP - part-of-speech (POS) tagging and sentences parsers tools developed by Stanford parser/tagger and
OpenNLP used. It marks the corresponding relation between a text to one specific part of statement or speech based on its definition and context (Choi, 2000).
9 | P a g e
In completing this research, the team also conducted some preliminary research on requirements engineering (Falbo, 2002) and conceptual structures (Sowa, 1994). The objective of this preliminary research is to understand the steps that requirements analyst takes in order to elicit the requirement with ontology method and the Artificial Intelligence mechanism that based on synthesis of logic and linguistics.
A model developed by Kof (2005) used as case reference in adapting the natural language processing in modeling the method. The three extraction steps ( extract terms, cluster term and locate associations between terms ) expanded in the designed method.
This method evaluated in one industrial case and it concludes as one feasible approach for requirement elicitation with ontology in semi-automatic way, as it explained by the authors. Further, this method used by other scholars as reference in constructs and explores a new domain ontologies based on requirement elicitation (Farfeleder et al., 2011), requirement specification (Bures et al., 2012), and defining a shared understanding between engineers and client (Zhang, 2011).
10 | P a g e
References
Bures, T., Hnetynka, P., Kroha, P., & Simko, V. (2012). Requirement Specifications Using Natural
Language.
Technical Report D3S-TR-2012-05. Czechoslovakia: Charles University in Prague.
Choi, F.Y.Y., (2000). Advances in domain independent linear text segmentation. Proceedings of the 1st
North American Chapter of the Association for Computational Linguistics Conference , (pp. 26-33).
Falbo, R.d.A., Guizzardi, G., & Duarte, K.C. (2002). An ontological approach to domain engineering.
Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering,
(pp. 351-358).
Farfeleder, S., Moser, T., Krall, A., Stålhane, T., Omoronyia, I., & Zojer, H. (2011). Ontology-Driven
Guidance for Requirements Elicitation. Lecture Notes in Computer Science Volume 6644 (pp. 212-226),
Berlin, Germany, Springer Berlin Heidelberg. 0302-9743.
Gruber, T.R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition
5(2) , (pp.199-220).
Gruninger, M. & Lee, J., (2002) Ontology: Applications and Design. Communication of the ACM , 45(2),
(pp. 39-41).
Kaiya, H. & Saeki, M., (2006). Using Domain Ontology as Domain Knowledge for Requirements
Elicitation. Requirements Engineering, 14th IEEE International Conference, (pp. 189-198).
Kof, L. (2005). An Application of Natural Language Processing to Domain Modelling - Two Case
Studies. International Journal on Computer Systems Science Engineering 20 , (pp. 37–52).
Kotonya, G. & Sommerville, I., (1998). Requirement Enginnering. John Wiley & Sons.
Lee, Y. & Zhao, W., (2006). Domain Requirements Elicitation and Analysis – An Ontology-Based
Approach. Proceedings of the First International Multi-Symposiums on Computer and Computational
Science , (pp. 805-813).
Liddy, E.D. (2001). Natural Language Processing. Encyclopedia of Library and Information Science, 2nd edn . Marcel Decker, Inc., New York, USA.
Maedche, A. & Staab, S., (2001). Ontology Learning for the Semantic Seb. Intelligent Systems, IEEE , (pp.
72-79).
Omoronyia, I., Sindre, G., Stålhane, T., Biffl, S., Moser, T., & Sunindyo, W. (2010). A Domain Ontology
Building Process for Guiding Requirements Elicitation. In R. Wieringa and A. Persson (Eds.): REFSQ
2010, LNCS 6182, (pp.188–202).Springer-Verlag Berlin Heidelberg.
Shibaoka, M., Kaiya, H. & Saeki, M., (2007). GOORE: Goal-Oriented and Ontology Driven Requirements
Elicitation Method. ER Workshops 2007 , (pp. 225-234)
Zhang, X. (2011). An Interactive Approach of Ontology-based Requirement Elicitation for Software
Customization. Electronic Theses and Dissertations , University of Windsor, Windsor, Canada.
11 | P a g e
Appendix – Rule-Based Baseline Ontology Extraction – Use cases of the noun phrase (NP) parse tree pattern for bridged-terms
(Automated activities)
The following figure 4 shows the patterns of the noun phrase (NP) tree in which bridge-terms can occur within the requirement statements. The leaf nodes in case 1 consist of a singular noun node or a sequence of singular nouns (NN) nodes separated with commas (,) on the left of and/or conjunction. The left of and/or conjunction are built in a sequence of NN nodes where NN consider as the last node, plural noun is
NNS, proper singular noun is NNP and proper plural noun is NNPS. Meanwhile, on the right hand side of the and/or conjunction where there are no commas.
The case 2 is similar to case 1 – the main difference is the first leaf node of the case 2 consider as an adjective (JJ) qualifier and there is no commas separating the first two leaf nodes in the sequence of how the prototype read the statements.
Figure 4
12 | P a g e