Process-Oriented Requirement Analysis Supporting the Data Warehouse Design Process A Use Case Driven Approach Beate List, Josef Schiefer, A Min Tjoa Institute of Software Technology (E188) Vienna University of Technology Favoritenstr. 9 - 11 / 188, A-1040 Vienna, Austria {list, js, tjoa}@ifs.tuwien.ac.at Abstract. Intelligent and comprehensive analysis systems are a powerful instrument for companies to analyse their business. The implementation of such analysis systems for an enterprise-wide management and decision support can be very different from traditional software implementations. Because analysis systems are strongly data-driven, the development process is highly dependent on its underlying data, which is generally stored in a data warehouse. Data warehouse systems generally concern many organisational units. Therefore, the collection of unambiguous, complete, verifiable, consistent and usable requirements can be a very difficult task. Use cases are considered as standard notation for object-oriented requirement modelling. We illustrate how use cases enhances communication between domain experts, data warehouse specialists, data warehouse designers and other professionals with different backgrounds. This paper explains how use cases can be used to elicit requirements for data warehouse systems, and how to involve the organisational context in the modelling process. With an adapted object model, we demonstrate how to capture different analysis perspectives of the business process. We develop a predefined set of dimension objects that belong to every classic business process and are able to create various fact objects, representing these perspectives. 1. Introduction A data warehouse is a common queryable source of data for analysis purposes, which is primary used as support for decision processes. It is multidimensional modelled and is used for the storage of historicized, cleansed, validated, synthesized, operative, internal and external data. Stakeholders of a data warehouse system are interested in analysing their business processes in a comprehensive and flexible way. Mostly they already have a comprehensive understanding of their business processes, which they want to explore and analyse. What they actually need is a view of their business processes and its data, which allows them an extensive analysis of their data. For this purpose data warehouses are modelled multidimensional, which corresponds to a typical view of its users. This analysis view of the business processes can be very different to the general view even though the underlying process is the same. Hence it is necessary to elicit requirements from the stakeholder of a data warehouse, which belong to their analysis views. The design of data warehouse system is highly dependent on these requirements. Very often data warehouses are built without understanding correctly these needs and requirements and consequently fail for that reason. The following scenario can often depict the implementation of a new data warehouse system in an enterprise: a system analyst of the IT department or consultant works together with the users to describe the needs and requirements for a data warehouse system. A team of developers receives these descriptions, but they have trouble understanding the business terminology and find the description too informal general to use for implementing the data warehouse system. The developers write their own system specification from a technical point of view. When the system specification is presented to the users, they do not quite understand it because it is too technical. They are, however, forced to accept it in order to move forward. This approach can easily result in a system that does not meet the requirements of the users because often the users, the system analysts and developers don’t speak the same language. Such communication problems can make it difficult to turn a description of an analysis system into a technical specification of a data warehouse system that all parties can understand. In addition, because a technical system specification that is not fully understood by the users, a data warehouse system becomes too difficult or impractical for the intended purposes. Therefore, it will not deliver the expected effect to the company. In these cases often departments will develop data marts for their own purposes, which can be considered as stovepipes and makes an enterprise-wide analysis system impossible. The challenge is to model a data warehouse system in a way that is both precise and user-friendly. Each symbol describing the analysis process should be intuitive for the user and have defined semantics, so that the developers can use the description as a general, but precise specification of the data warehouse system. Use cases have two advantages, which make them suitable for representing the requirements for a data warehouse system. First, use case diagrams are part of the UML standard and hence are generally accepted as a notational standard in the software community and second, they can be used on a general level, where implementation details are completely suppressed. Hence, use cases are very suitable for the communication with the users of the data warehouse system. The IEEE guide [9] states that care should be taken to distinguish between the model for the application and the model for the software, which is required to implement the application model. The model, which is to be used as the basis for the specification is crucial, as it must integrate widely-differing types of information from users. Important characteristics that such a model should possess are [9]: A high level of abstraction. The model should be at the level of the users’ views of the desired system, and should capture their concepts directly. It should not indiscriminately mix this high level information with information that is relevant to lower levels of the development process. Human-readability. The language in which the model is to be expressed will be used for validating the specification: that is, for presenting the specification to users for their views on its contents. Human understandability is thus the prime concern; and there is a need to avoid the well-known problems that users experience in trying to understand formal requirements specification languages. Precision. A high-level specification language, for project scope agreement, delineating the system boundaries and naming major objects, rules and processes, is required. Further detail should be left to subsequent development phases. However, the language must be precisely defined, to reduce ambiguity, and to allow for formal integration and consistency checking methods to operate on this representational form. Specification completeness. It is important that the model captures all aspects of a specification, particularly the information needs of the users. Mappability to later phases. A requirements definition phase will typically be followed by a detailed analysis and design phases. Therefore, the model should possess a structure suitable for mapping on the later phases. In the following paper we show an object-oriented approach for a process-oriented requirement analysis, which grasp the stated model characteristics. The remainder of this paper is organized as follows. In section 2 we discuss related works. In section 3 we show how data warehouse requirements arise and their impacts on the data warehouse system. In section 4 we consider the organisational context. Section 5 and 6 show how to apply use cases and object models for the requirement analysis process. The paper concludes with section 7, which presents our current and future work. 2. Related Work and Business Process Orientation Building a data warehouse is different than developing transaction systems, whereby the requirement analysis process for the latter is supported by numerous methods [2], [4], [6], [12]. Up to now the data warehouse design process has not been supported by a formal requirement analysis method although there are some approaches for requirement gathering. In [19] a catalogue for conducting user interviews in order to collect end user requirements is proposed. Inmon argues in [10] that the data warehouse environment is data driven, in comparison to classical systems, which are requirement driven, and the requirements are understood after it is populated with data and being used by the decision support analyst. He derives the data model by transferring the corporate data model into a data warehouse schema and by adding performance factors. In our opinion, requirements can and must be gathered before the data warehouse design process otherwise only those parts are captured, which are in the basic corporate model. [1] states that a data warehouse is designed to support the business process rather than specific query requirements, but do not further discuss their statement. An other process driven approach is applied by Kimball [13], whereby the fundamental step of the design process is based on choosing a business process to model. As this approach has proven its success in various projects, and enterprises in general have shifted to process-centred organising, we adopt the process oriented approach for the basis of our work: a formal requirement analysis concept for data warehousing. The process approach for data warehouse design enables organisations to focus on the performance of business processes and drift away from traditional task or department performance measurement. Hammer also criticised in [8] that everyone is watching out for task performance, but no one has been watching to see if all the tasks together produce the results they are supposed to, for the customer, the single most important word in the definition business processes. The change to process centring is not primarily a structural one (although it has deep and lasting structural implications), it is not announced by issuing a new organisational chart and assigning a new set of managerial titles, but process centring is a shift in perspective, in which tasks and processes exchange places [8]. In practice this means that the traditional hierarchical organisation remains and the process view is integrated into the industrial way of organising. Therefore business processes cross organisational boundaries and often tend to be inefficient because of changing responsibilities, long delay times and so forth [15]. Therefore, a process oriented data warehouse design focuses on the entire process including all involved departments and supports a development towards a process-centred organisation. In [18] we have already started analysing and presenting the requirements for a data warehouse model with a process approach. Use cases and objects were applied as proposed by [12]. We realised that the concept is confined to business processes without any analysis capabilities for decision support purposes and an adaptation to these needs would be required. But we further realised that the support of different views of the models is an opportunity to capture multidimensional and aggregated views of data warehouses. Basically use case models and object models can be applied to software processes and business processes, which use the same notation but represent different functionality. As argued above, we aim at analysing business processes and focus therefore on the business process approach provided by these models and described in [12]. The use case model describes the business from an external point of view and provides an overview of the area of interest, while the object model is an internal model and describes accurately each business process with its tasks and resources in order to make the model clearer [12]. Use cases represent business processes. In this paper we present the adaptation of the use case and object model in order to support the data warehouse requirement analysis. The aim is to provide a formal requirement engineering method that is easy to use, quickly to understand, covering all major data warehouse model characteristics and is therefore a means of communication between all involved parties in the requirement process. We also apply the model to a classical business process that has already been proven in a real data warehouse environment. 3. Data Warehouse Requirements Requirements to the data warehouse system determine what data must be available, how it is organized, and how often it is updated. Furthermore the requirements enable the stakeholders to communicate the purpose, establish the direction and set the expectations of information goals for the enterprise. It is advantageous to start the requirement collection process with a macro business discovery (see Figure 1), which is necessary to identify the key business processes, key business metrics, measures and requirements. A high-level model has to be developed depicting how business is conducted. The collection of the requirements of the stakeholder is used to create a vision document, which describes the high-level properties of the data warehouse system. These high-level properties express services, which have to be provided to meet the requirements of the stakeholders. This relationship between high-level properties of the data warehouse system and its services for the stakeholder leads to a derivation of the data warehouse requirements. Macro Business Discovery - Identification of * key business processes * key business requirements * key business metrics, measures Micro Business Discovery - Detailed model of business requirements - Model for business processes - Model for organizational structures Data Warehouse Requirements Figure 1: Requirement Discovery Process for Data Warehouse Requirements The micro business discovery is an in-depth analysis of the requirements of the organization as defined by the macro business discovery. It describes the business requirements in more detail by considering these requirements in context of the models for the business processes and the organisational structures. Figure 2 shows the impacts of data warehouse requirements [14]. The figure demonstrates very well, that data warehouse requirements directly affect technical aspects of the data warehouse system. The aim of data warehouse requirements is to provide a comprehensive specification of primary technical aspects of the data warehouse system, to make a successful implementation possible. Technical Technical Architecture Architecture Dimensinal Dimensinal Model Model Project Project Planning Planning && Management Management End End User User Applications Applications Data Data Warehouse Warehouse Requirements Requirements Data Data Procurement Procurement Deployment Deployment Maintenance Maintenance && Growth Growth Physical Physical Design Design Data Data Conditions Conditions (Data (Data Quality, Quality, LoG...) LoG...) Figure 2: Impact of Warehouses Requirements (adapted from [14]) In the requirement collection process data warehouse requirements have to be driven by the business requirements. They arise from the macro and micro business discovery and result in a comprehensive technical specification. Because the business and technical specifications can be very different, business requirements and (technical) data warehouse requirements should be modelled separately. Business requirements describe the needs of the stakeholders from their business point of view. On the other hand data warehouse requirements are derived from the business requirements and their aim is to provide a comprehensive, precise specification for the data warehouse team. They describe technical data warehouse aspects, whereby the target group is the data warehouse team and not the stakeholders. Current use case oriented approaches for the requirement analysis are used to describe technical aspects and ignore the business view [5]. Originally use cases were designed to describe primary system behaviour and the involved actors. For data warehousing systems there are other important aspects than system behaviour. Further important aspects are the actual data and information, which are stored in the data warehouse. By integrating this central aspect in the requirement modelling process we present an approach of how to gather the business requirements from stakeholders corresponding to their business processes. In the requirement analysis process it is important to consider the organisational context of the stakeholders. Furthermore it is important, to classify them in order to provide a comprehensive picture of the future data warehouse users. Next section gives an overview of the stakeholders and their organisational context. 4. Organisational Context A data warehouse is intended to provide an enterprise-wide decision support for an organisation. It should address the users of all hierarchy levels of the organisation. This broad spectrum of different types of end users requires information at different granularity levels to meet their specific needs. Figure 3 shows the often-found structure of traditional organizations. Within such a structure, different requirements for data processing and data analysis can be identified, which correspond to the different layers of the organisation. As we move upward through the layers of the hierarchy, for example, the level of granularity required decreases. Executives use highly summarized information, while line managers work at a detailed level. Administrative users need to create and maintain individual data items such as orders or customer records. Professional users and line managers require statistical analysis tools. Executives require systems that highlight anomalies or geographically show key indicators, and allow drill-down in problem areas. During the micro business discovery requirements of data warehouse users of all different levels in the hierarchy are identified. This discovery process should allow a combination of all requirements to model a coherent enterprise-wide decision support system. What data warehouse users require is an environment that allows data coherence and functional integration, enabling them to achieve their business goals. Localised Decision Support Systems Enterprise-wide Decision Support Systems Executive Executive Management Management Professional Professional Administration Administration Figure 3: Localised vs. Enterprise-wide Decision Support Systems Figure 3 shows an enterprise-wide decision support system with its organisational hierarchy, which can achieve these business goals. For gathering the meshed requirements of such a decision support system, it is necessary to use models which represent all business requirements on each hierarchy level and which allow a requirement consolidation of all levels. 5. Use Case Model The use case model captures business processes in the company that satisfy the customers’ interests, and the interests of others outside the company (partners, suppliers, etc) [12]. Our use case model describes business processes in the company that are analysed and provide information to the data warehouse user and the company’s staff. The model shows how analysts use the data warehouse. It provides therefore a specification of the data warehouse user’s roles and the business processes they are analysing. Analysis System and Subsystems. The analysis system or subsystem is the modelling concept we use to symbolise the business or area of responsibility that we analyse. We use the same symbol as Jacobson in 0, the rectangle with rounded corners. Analyst. The analyst represents a role that someone or something in the environment can play in relation to the analysis system. In our model analyst represent the environment that analyse business processes belonging to the specified business system. Use Case. The use case is our construct for a business process that is analysed. Uses Association. The uses association is a must association and describes use cases that are aggregated into other use cases. This concept must be applied when parts of the use case are analysed by other analysts. Example We have applied our entire data warehouse use case model in the example of Figure 4. The analysis system is a grocery store with a warehouse and a market as subsystems. The use case ‘Managing Demand’ represents the business process that receives the goods in the warehouse, manages the storage in the warehouse, delivers the goods to the supermarket and manages the storage in the market until they are sold. This use case is analysed by the fulfilment manager and aggregates the managing order use case and the selling use case. We have applied a uses association to managing order and selling, because managing demand combines and therefore aggregates both use cases. The managing order use case belongs to the warehouse and is analysed by the logistic manager and the selling use case belongs to the market and is analysed by the supermarket head. Grocery Store Managing Demand >>uses<< Fulfilment Manager Warehouse Managing Order >>uses<< Market Selling Logistic Manager Supermarket Head Figure 4: Example of an Analysis System 6. Object Model The object model provides a clear picture how the use case is structured internally in order to realise the analysis capabilities of the process (backward engineering) and to describe the analysis requirements of various actors (forward engineering). The internal structure consists of fact and dimension objects. Dimension Objects Business processes have a lot of characteristics in common. Analyses of various traditional and fully accepted business process definitions provide various process attributes that reflect the structure of classical business processes. As we apply an object-oriented paradigm to business processes in general, and the object model to the internal structure, we have to change our point of view and look for objects in the business process instead of attributes. Describing processes by the means of objects gives more importance to the internal structure and additionally empowers to become an equal part of the process. Davenport states in [3] that a process is a structured, measured set of activities designed to produce a specified output for a particular customer or market. He notes that a business process is a specific ordering of work activities across time and place, with a beginning, an end, and clearly identified inputs and outputs: a structure of action. Hammer argues in [7] that a business process is a group of tasks that together create a result of value to the customer. A business process is a collection of activities that takes one ore more kinds of input and creates an output that is of value to the customer. Jacobson’s approach in [12] is, that the purpose of each business process is to offer each customer the right product or service (that is, the right deliverable), with a high degree of performance measured against cost, longevity, service and quality. Davenport’s process definition comprises measure, activity, customer, market, time, place, input and output objects. Hammer’s definition consists of activity, result, customer and input objects, and Jacobson focuses on the customer, product, service, deliverable and measure objects. Davenport provides the most comprehensive definition, which basically includes the objects of both other definitions. The objects in the process definitions highlight key business process characteristics, but represent also classical data warehouse dimensions (e.g. organisation, customer, product, service, time, etc.) and data warehouse facts (measure). As these key business process objects can be found in any classical business process, we propose a standard set of dimension objects representing data warehouse dimensions (Figure 5): Customer Deliverable Organisation Time Figure 5: Data Warehouse Main Dimension Objects in Classic Business Processes Customer Dimension Object. The single most important word in the definition of process is ‘customer’ and a process perspective on a business is the customer’s perspective [8]. Our strong focus on business processes, with the satisfied customer as the main target of the process, leads directly to the customer dimension object. Deliverable Dimension Object. Davenport and Hammer do not specify the output [3] or result of value [7] of the process in their definition, while Jacobson’s approach is to offer the right product or service to the customer [12] and integrates both terms into deliverable. The customer does not see or care about the company’s organisational structure or its management philosophies; the customer sees only the company’s products and services, all of which are produced by its processes [8]. As the output or result of the value of a process might be the opposite of the defined process deliverables, and products or services are too specific for a requirement analysis model, we cover the process targets with the deliverable dimension object. Organisation Dimension Object. Business processes typically flow through several organisational units and cross a lot of responsibilities. Leymann and Roller argue in [15] that business processes that often cross organisational boundaries tend to be inefficient because of changing responsibilities, long delay times, etc. and further that it is obvious that the process reflects the hierarchical structures of the organisation. The analysis of the organisational structure detects occurrences that are caused by certain units and therefore organisation dimension object is required. Time Dimension Object. A business process is a specific ordering of work activities across time with a beginning and an end [3]. The end of the process represents a point in time where the results of the process are delivered and cannot be changed anymore. We address this point for the time dimension object. Fact Object. Performance is measured against cost, longevity, service and quality [12]. These measures (e.g. turnover, profit, ratio rates etc.) represent facts in data warehouse notation and fact object in this paper. As business processes often consist various facts, which are created by different dimensions, we introduce a communication association to demonstrate the communication - the fact - that is created by certain dimension objects (see Figure 6). Customer Measure 1 Measure 2 Deliverable Organisation Time Figure 6: Data Warehouse Fact Objects 7. Conclusion In this paper we presented the adaptation of use case and object models for modelling business requirements for data warehouse systems to support the data warehouse design process. We showed how data warehouse requirements are derived from business requirements and their organisation context. Our use case model is an excellent means of both expressing requirements with regard to the data warehouse and providing a comprehensive picture of what the data warehouse is intended to perform. It illustrates the function of the business, the process analysts and the business process to be analysed with its aggregations. The object model is an internal model that captures different analysis perspectives of the business process. Various facts represent these perspectives, which are influenced by dimension objects. In our future work we will concentrate improving the discussed models in order to support more appropriately the data warehouse design process. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] Anahory, S. Murray, D., Data Warehousing in the real World – A practical Guide for Building Decision Support Systems, Addison-Wesley Publishing 1997. Coad, P., Yourdon, E., Object-Oriented Analysis. Englewood Cliffs, N.J.: Prentice Hall 1991. Davenport, Th., Process Innovation: Reengineering Work through Information Technology, Harvard Business School Press 1993. Davis, A., Hisa, P., Status Report: Requirements Engineering, IEEE Software November 1993. Debevoise, N. T., The Data Warehouse Method, Prentice-Hall, 1999. Finkelstein, A., Requirements Engineering Research: Coordination and Infrastructure, Requirements Engineering 1, 1996. Hammer, M., Champy, J., Reengineering the Corporation – A Manifesto for Business Revolution, Harper Collins Publishers 1993. Hammer, M., Beyond Reengineering, Harper Collins Publishers 1996. IEEE, IEEE Guide to Software Requirements Specifications, IEEE Inc., 1984 Inmon, W. H., Building the Data Warehouse, Wiley & Sons 1996. Jacobson, I., Christerson, M., Jonsson, P., Overgaard, G., Object-Oriented – Software Engineering – A Use Case Driven Approach, Addison Wesley 1992. Jacobson, I., Ericson, M., Jacobson, A., The Object Advantage – Business Process Reengineering with Object Technology, ACM Press, Addison-Wesley Publishing 1995. Kimball, R., The Data Warehouse Toolkit: Practical Techniques For Building Dimensional Data Warehouse, John Wiley & Sons 1996. Kimball, R., Reeves, L., Ross, M., Thornthwaite, W., The Data Warehouse Lifecycle Toolkit, John Wiley & Sons, 1998 Leymann, F., Roller, D., Production Workflow – Concepts and Techniques, Prentice Hall PTR, 2000. List, B., Schiefer, J., Tjoa, A. M., Quirchmayr, G., The Process Warehouse - A Data Warehouse Approach for Business Process Management, In e-Business and Intelligent Management - Proceedings of the International Conference on Management of Information and Communication Technology (MiCT1999), Copenhagen, Denmark, 1999. List, B., Schiefer, J., Tjoa, A. M., Quirchmayr, G., The Process Warehouse Approach for Inter-Organisational e-Business Process Improvement, To appear in Proceedings of ReTechnologies for Information Systems (ReTIS2000) integrated in Reengineering Week 2000, Zurich, Switzerland, 2000. List, B., Schiefer, J., Tjoa, A. M., Quirchmayr, G., A Generic Data Model for the Process Warehouse - An Approach for Multidimensional Business Process Analysis, Proceedings of Business Information Systems 2000 (BIS 2000), Poznan, Poland, 2000. Poe, V., Building a Data Warehouse for Decision Support, Prentice Hall 1995.