S-DWH Business Architecture: Generic Process Models Summary: The Data Warehouse approach as a Single Coherent Production System. Business process analysis of a SDWH: Generic Statistical Business Process Model (GSBPM). ----end first part Business Process Model Notation (BPMN) Mapping the BPM-Notation on a SDWH layerd architecture: interactive session National Institute of Statistics – Italy Antonio Laureti Palma - IT - Structural Business Statistics Unit Mauro Masselli - Head of Structural Business Statistics Unit Workshop “ESS NET on “Micro data linking and warehousing in statistical production “ Cardiff 23-24 May 2012 1 Statistical stove-pipe-like production system In a stove-pipe production system every single production line corresponds to a specific topic of statistics, with its own production system. So, from survey design to dissemination, the whole production process, for each topic of statistics, takes place independently and with its own data suppliers and user groups: survey data elaboration output Structural Business Statistics SBS SBS Short Term business Statistics STS STS Information Society IS IS Science Technology Innovation STI STI Business Register BR Administrative 2 The Data Warehouse approach as a Single Coherent System The Statistical Data Warehouse (SDWH) for business statistics, in our vision, is a central statistical data store regardless of the data’s source, for managing all available data of interest, improving the NSI’s ability to: (re)use data to create new data/new outputs; perform reporting; execute analysis; produce the necessary information. survey data elaboration output Structural Business Statistics SBS Short Term business Statistics STS Information Society Data Warehouse Science Technology Innovation Business Register IS STI SBR ….administrative data 3 The Data Warehouse approach as a Single Coherent System To provide such a DWH architectural vision in the context of statistics production, we could use a DWH architecture model of four functional layers: I. source layer, is the level in which we locate all the activities related to storing and managing internal (surveys) or external (archives) raw data sources. II. integration layer, on this layer performs the typical Extraction Transformation and Loading functions; which must be realized in automatic or semi-automatic ways III. interpretation and data analysis layer is specialized to interactive and not structural activities. IV. access layer is addressed to a wide typology of users or informatics instruments for the final presentation of the information sought 4 The Data Warehouse approach as a Single Coherent System The SDWH is then at the center of a corporate information statistical domain (group of topics), in which all production processes follow a coherent design. To define and enable its evolution requires creating, communicating and improving the key requirements, principles and models. A high level of coordination is necessary both within different topics and within different operational phase activities, and between the topics and activities. 5 The Data Warehouse approach as a Single Coherent System Consequently, any NSI adopting a SDWH approach as single coherent system, or part of it, should adapt business production architecture, information system architecture and technology architecture: The Business Architecture aligns strategic objectives and tactical demands. This provides a common understanding of the organization described by the management processes and business operational processes. The Information Systems Architecture, in our context, is the design of an effective SDWH, in terms of data and metadata, which can support tactical demands. The Technology Architecture, is the combined set of software, hardware and networks to develop and support IT services. 6 The Data Warehouse approach as a Single Coherent System The key business processes within the business architecture for statistics that should generally be considered are: Statistical program management (new or redesign of processes); Project management; Resource management; Operational management (data and metadata generated); Metadata management (generated and processed within each phase); Data management (security, custodianship and ownership); Customer management (promoting statistical product); Statistical framework (standards, methodologies and concepts); Quality management (assessment and control); Burden management; Knowledge management; Software and IT infrastructure management. 7 Modeling the Business Architecture In statistics, a possible standard representation to describe a Business Architecture is the Generic Statistical Business Process Model (GSBPM). Here are 9 phases to describe and define the set of statistical business processes required to produce official statistics: 1 Specify Needs, 2 Design, 3 Build, 4 Collect, 5 Process, 6 Analyze, 7 Disseminate, 8 Archive, 9 Evaluate. Each phase is articulated by several sub statistical processes; which, according to process modelling theory, each sub-process should have a number of clearly identified attributes (input, output, owner, purpose, guide, enablers, feedback,..) 8 Generic Statistical Business Process Model (GSBPM) 9 business architecture- business operational processes Up to now, in WP3 activity, we have restricted this analysis to business operational processes which include phases 4 to 7 of the GSBPM : 4 Collect - This phase collects all necessary data, using different collection modes, and loads them into the appropriate data environment. 5 Process - This phase describes the cleaning of data records and their preparation for analysis. For statistical outputs produced regularly, this phase occurs in each iteration. 6 Analyze - In this phase, statistics are produced and examined in detail. The “Analyze” and “Process” phases can be iterative and parallel; analysis can reveal a broader understanding of the data, which might make it apparent that additional processing is needed. 7 Disseminate - This phase manages the release of the statistical products to customers. For statistical outputs produced regularly, this 10 phase occurs in each iteration. Business process in the SDWH Layered Architecture Source Layer is the level in which we locate all the activities related to storing and managing internal (surveys) or external (archives) raw data sources. Typically, this phase does not include any transformation of collected data but it should include information on how, when and from who the collection has been finalized. Analyzing the GSBPM, sub-processes activity that could be included in this layer are: 4- Collect: 4.2-set up collection, 4.3-run collection, 4.4-finalize collection. 11 Business process in the SDWH Layered Architecture Integration layer activities on this layer performs the typical Extraction Transformation and Loading functions; which must be realized in automatic or semi-automatic ways. All the activities of this layer are related to regular iterations processes and structured activities. Which clean, link and harmonize data-information in a common operational persistent area. Analyzing the GSBPM, we include on this layer: 5- Process 5.1-integrate data, 5.2-classify & code, 5.3-review, validate & edit, 5.4-impute, 5.5-derive new variables, 5.6-calculate weights, 5.7-calculate aggregate, 5.8-finalize data files. 6- Analyze 6.2-validate outputs, 6.4-apply disclosure control, 6.5-finalize outputs. 7- Disseminate 7.1-update output systems. 12 Business process in the SDWH Layered Architecture Interpretation and data analysis layer is specialized to interactive and not structured activities. On it are grouped all functionalities able to support expert users for producing strategic value information or design new statistical strategies. On this layer experts can design the complete process of information delivery. Sub-processes in this layer must be suitable to support experts for free data analysis also in order to test any possible statistical methodology, or strategy, able to satisfy regular production process of any future iterations in the integration layer. Analyzing the GSBPM, we include on this layer: 4- Collect 4.1-select sample, 4.2-set up collection, 5- Process 5.1-integrate data; 6- Analyze 6.1-prepare draft output; 6.3-scrutinize and explain; 6.5-finalize outputs 7- Disseminate 7.2-produce dissemination products 13 Business process in the SDWH Layered Architecture Access Layer is addressed to a wide typology of users or informatic instruments for the final presentation of the information sought. In order to support different typology of users sub-process in this layer must be able to transform, or efficiently manipulate, data-information already estimated and validated in the previews layers and able to apply disclosure controls. Specialized Business Intelligence tools, on it we should consider also graphics and publishing tools are able to generate graphs and tables for users. Software interface towards others external integrated output system, typical example is the interchange of macro data information via SDMX. Analyzing the GSBPM, we include on this layer: 5- Process 5.7-calculate aggregate; 6- Analyze 6.4-apply disclosure control; 7- Disseminate 7.1-update output systems 7.3-manage dissemination products 7.4-promote dissemination 7.5-manage user support 14 business architecture – BPM Notation To describe processes, during analytical activities, we have used the Business Process Model Notation (BPMN), in a simplified version, in which only four descriptor objects are used, they are: activity object describes the actor and the sub –process which must be realized. Actor A: sub-process which must be done sequence flow object shows in which order the activities are performed; association object is used to associate objects, and can indicate some directionality using an open arrowhead, toward the object to represent a result, from the object to represent an input, and both to indicate it is read and updated; data objects (represented with a rhombus) show the reader which data is required or produced in an activity. data 15 business architecture – BPM Notation Example of work flow using the BPM Notation: B, C: 4.2 set up collection B, C: 4.1 select sample C: 4.3 run collection surveys data Satistical Business Register A: 5.2 classify & code (priorities) administrative data A: 5.3 review, validate & edit Legend: A: Register unit B: Statistical Methods unit C: Information Collection department A: 5.1 integrate data 16 business architecture – BPM Notation Analysis To begin a design of a business architecture for a generic SDWH for business statistics we focus our analysis on operational processes of specific output variables. Business operational view, defines the set of strategic, core and support operational structures that transcend functional and organizational boundaries. It also sets the boundary of the enterprise by identifying and describing external entities such as customers, suppliers, and external systems that interact with the business. The operational structures describe which resources and controls are involved. The lowest operational level describes the manual and automated tasks that make up workflow. 17 Business Architecture – BPM Analysis In order to gain insight into how to design a business architecture we have asked each ess-net participant to make an analysis on operational production lines of specific output variables related to Structural Business Statistics (SBS), Short Term Statistics (STS) and the Statistical Business Register using the BPMN. This choice was motivated by the intent of emphasize statistical-domain and production-timing dependences: for SBS we have considered: - System of company accounts, - PRODuction COMmunautaire; for STS we have considered: - Monthly industrial production - Monthly retail sales - Quarterly turnover in the services - Quarterly services producer prices - External Trade statistics. Detailed synthetic results will be shown later, as example, to prepare the interactive session. 18 BPM Analysis: Structural Business Statistics SBS The Structural Business Statistics SBS cover industry, construction, trade and services. Presented according to the Statistical Classification of Economy Activity in the European Union (NACE 2) activity classification, they describe the structure, conduct and performance of businesses. CASE 1 CODE PERIOD DESCRIPTION 11110 Annual Number of enterprises 12110 Annual Turnover 12120 Annual Production value 16110 Annual Number of persons employed Generally SBS does not collect information on products. The external trade and the production of specific products are covered by Prodcom and external trade statistics. 19 BPM Analysis: Structural Business Statistics (EE-Case) 20 BPM Analysis: External Trade statistics ET The External Trade statistics track the value and quantity of goods traded between EU Member States (intra-EU trade) and between Member States and non-EU countries (extra-EU trade). They are the official source of information on imports, exports and trade balance of the EU, its Member States and the euro area. We will make a top-down analysis of the two ET measures, independently from the products. These measures are related to other statistical production lines considered: CASE 4 CODE PERIOD DESCRIPTION 01 monthly Quantity expressed in net mass 02 monthly Quantity expressed in supplementary units 25 BPM Analysis: External Trade statistics (IT-Case) 26 BPM Analysis: Short-term statistics (STS) Short-term statistics (STS) describe the most recent developments of country economies. STS cover four major economic domains: industry, construction, retail trade and other services. In the field of STS, the development in the different economic domains is described with a series of indicators (STS indicators) such as production, turnover, new orders received, prices, number of persons employed, gross wages and several more. STS indicators are published as indices which show the changes of the indicator in comparison with a fixed reference year. STS indicators are generally published with a monthly frequency. These measures are related to other statistical production lines considered: CASE 3 CODE PERIOD DESCRIPTION A:110 Industrial production Monthly 27 BPM Analysis: Structural Business Statistics STS (IT-Case) 28 BPM Analysis: Statistical Business Register (SBR) The Statistical Business Register is a register of all enterprises as well as their workplaces. The availability of Statistical BRs is the key to the compilation of consistent and comparable short-term and structural business statistics. SBRs are crucial for establishing efficient statistical survey frames which aim to reduce the reporting burden on enterprises. We will make a top-down analysis from two significant BR values, which even if they are not objects of direct statistical output at Estat level, have an important impact on economic-stratification characteristics for STS and SBS: CASE 5 CODE PERIOD DESCRIPTION 11110 Annual Number of enterprises 11210 Annual Number of local units 31 BPM Analysis: Statistical Business Register (PT-Case) B 4.1 select sample B, C: 4.2 set up collection C: 4.3 run collection Surveys data A: 5.2 classify & code (priorities) Complete SBR (T-1) Administr ative data SBR SBR changes (T-1) A: 5.3 review, validate & edit A: 5.1 integrate data A: 6.2 validate outputs A: Register unit B: Statistical Methods unit C: Information Collection department D: Dissemination unit E: Statistical Producers units F: Economical Statistics department G: National Accounts department F, G: 6.2 validate outputs A: 7.1 update output systems D, E: 7.2 produce dissemination products 32 Mapping the BPM-Notation on a SDWH layerd architecture Preparation of the interactive session: Process goal information from SBS, STS, ET and SBR strategies will then be used to derive the most realistic step towards a SDWH-Business Architecture. In the following we will analyse each GSBPM phase trying to allocate the relative sub processes on a SDWH layered architecture. Mapping the GSBPM operational process on a SDWH architecture means a process flow through functional layers. 33 Mapping the BPM-Notation on a SDWH layered architecture To this aim, it will be used a graphical mapping of GSBPM on a SDWH layered architecture, where: GSBPM phases are on horizontal axis on different columns SDWH layers are on vertical axis on different rows they cross produce a cell-matrix representation of possible subprocesses. gray diagonal represents, in a generic statistical process, the most likely association area between SDWH-layers and GSBPM phases. 4 Collect 5 Process 7 Disseminate 6 Analyze 7.5 7.4 7.3 7.2 7.1 6.5 6.4 6.3 6.2 6.1 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 4.4 4.3 4.2 4.1 access Layer interpretation and analysis layer integration layer source Layer 34 Mapping the BPM-Notation on a SDWH layered architecture To include a sub-process in a process flow we fill a matrix cell with a circle and connect subsequent sub-processes. As on BPMN, rhombus are representing data objects and must be positioned at layer level. CAWI In order to identify actor of each sub-process we fill each circle with several color. The different actors will be referenced on a separated legend with the associated color. 4 Collect 5 Process 7 Disseminate 6 Analyze 7.5 7.4 7.3 7.2 7.1 6.5 6.4 6.3 6.2 6.1 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 4.4 4.3 4.2 4.1 access Layer admin balance-fiscal-tax CAWI Legend: A - Data collection department B - Enterprise statistical department interpretation and analysis layer integration layer source Layer 35 Mapping the BPM-Notation on a SDWH layerd architecture The picture shows an example of GSBPM on SDWH mapping of the Statistical Business Register operational process made by PT: 36 Mapping the BPM-Notation on a SDWH layered architecture The picture shows a merge of all BPM analyzed from all statistical processes of different ess-net members on the SDWH, it is evident the articulated allocation of the overlap of several production. 38 40