“Overview WP3: Building and implementation of the SDWH: Architectural and technical aspects” Antonio Laureti Palma Head of Unit: Implementation of the new management and development model of IT functions Italy, ISTAT- National Accounts and Economic Statistics Department FINAL Workshop ESS NET ON: “MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION” Amsterdam 25 & 26 SEPTEMBER 2013 WP 3 - Building and implementation of the S-DWH: Architectural and technical aspects The goal of the statistical data warehouse is to enable NSIs to produce flexible outputs with a maximum re-use of data that is already in the statistical system. This work package focused on all essential architectural and technical elements for developing a new model of data re-use, i.e. a new model of statistical production based on a Data Warehouse architecture, which we define as Statistical-Data Warehouse (S-DWH). The countries involved in the third WP are: IT, PT, LT, EE, SE, UK To ensure an efficient way of working and clear communication between WP members, specific WP meetings were held. WP 3 - Building and implementation of the S-DWH: Architectural and technical aspects Our aims were to: Provide Business Architecture. Define an architectural framework for the S-DWH for business statistics. Produce functional architecture of the S-DWH. Study management of the information flow between data sources and S-DWH central administration. Produce a modular workflow approach. Identify an 'ideal' architectural design for an actual implementation and development strategy. Study various technical aspects regarding the S-DWH (performance issues, design issues, etc.). WP 3 - Building and implementation of the S-DWH: Architectural and technical aspects The following deliverables were produced: • WP3.1 – “Business Architecture of the S-DWH” (IT, PT) Antonio Laureti Palma, Sonia Quaresma • WP3.2 – “Modular Workflow of the S-DWH” (EE, LT) Allan Randlepp, VZ • WP3.3 – “Functional Architecture of the S-DWH” (SE, IT) Bjorn Berglund, ALP • WP3.4 – “Overview of various technical aspects” (LT) Valerij Zavoronok • WP3.5 – “Relate the 'ideal' architectural scheme into an actual development and implementation strategy” (PT, IT, LT, EE, SE, UK) Pedro Cunha, AL P, AR, VZ, BB, Colin Bower WP 3 - Building and implementation of the S-DWH: Architectural and technical aspects Stat Information system view : WP 3 - Building and implementation of the S-DWH: Architectural and technical aspects In the context of the Ess-Net, we identify this system overlapping as the effective DWH in which we store statistical information of several statistical domains for supporting any analysis for strategic NSI’s or European decisions related to statistics. This identifies a new possible approach to statistical production, in which data from different production departments are integrate and store for several use. WP 3 - Building and implementation of the S-DWH: Architectural and technical aspects The general architecture is characterized by four functional layers, starting from the bottom up to the top of the architectural pile, they are defined as: IV° - access layer, for the final presentation, dissemination and delivery of the information sought specialized for external users; III° - interpretation and data analysis layer, enables any data analysis or data mining, functional to support statistical design or any new strategies, as well as data re-use; II° - integration layer, is where all operational activities needed for any statistical production process are carried out; in this layer data are manly transformed from raw to cleaned data and this activities are carried on by internal operators; I° - source layer, is the level in which we locate all the activities related to storing and managing internal or external data sources. WP 3 - Building and implementation of the S-DWH: Architectural and technical aspects Data Flows, Layered view: strategic decisions Regular dissemination of statistical information IV° - ACCESS LAYER III° - INTERPRETATION AND ANALYSIS LAYER II° - INTEGRATION LAYER (re)use data produce the necessary information I° - SOURCES LAYER WP3.1 – “Business Architecture of the S-DWH” In the deliverables, indications for a S-DWH as a common IT infrastructure for statistical production have beenpresented. These have been based on three IT domains (WP3.5): Business Architecture - used to align strategic objectives and tactical demands, is sub-divided into: business processes (WP3.1), used to create the primary value stream, management processes, that govern the operation of a system (WP3.3); Information Systems Architecture - the conceptual organization of the effective S-DWH which is able to support tactical demands (WP3.2). Technology Architecture - the combined set of software, hardware and networks able to develop and support IT services (WP3.2, WP3.4). WP3.1 – “Business Architecture of the S-DWH” Views of the BA: Statisticians: There are typically only a handful of sophisticated analysts—Statisticians and operations research types—in any organization. They are some of the best users of the data warehouse; those whose work can contribute to deeply influence the operations and profitability of the company. Knowledge Workers: Usually a relatively small number of analysts perform the bulk of new queries and analyses against the data warehouse. These are the users who get the "Designer" or "Analyst" versions of user access tools. After a few iterations, those queries and reports typically get published for the benefit of the Information Consumers. Information Consumers: Characteristically most users of the data warehouse are Information Consumers; they will probably never compose a true ad hoc query. They use static or simple interactive reports that others have developed. Executives: Executives are a special case of the Information Consumers group. Few executives actually issue their own queries, but an executive's slightest musing can generate a flurry of activity among the other types of users. WP3.1 – “Business Architecture of the S-DWH” As start point, we described and analyzed some NSI business processes using a simplified Business Process Model Notation (BPMN) by adopting the Generic Statistical Business Process Model (GSBPM) taxonomy. We have applied a top-down analysis, starting from specific statistical output and then down to the sources of the following European regulated output: Structural Business Statistics (SBS), Short Term Statistics (STS), Production Communautaire (Prodcom), Trade Statistics, WP3.1 – “Business Architecture of the S-DWH” We have synthesized the results by graphically mapping operational processes on S-DWH functional layer architecture: WP3.2 – “Modular Workflow of the S-DWH” In the deliverable WP3.2 the integration model and warehouse approach were put together, independently from the statistical methodological strategy. The S-DWH environment have been used to manage changes of statistical production process, through the Interpretation Layer able to refine the S-DWH-self. The integration have been seen from three main viewpoints: Technical integration – integrating IT platforms and software tools; Process integration – integrating statistical processes like survey design, sample selection, data processing, etc; Data integration – data are stored once, but used for multiple purposes; when we put all these three integration aspects together, we get a SDWH, which is built on integrated technology. WP3.2 – “Modular Workflow of the S-DWH” It have emphasized that different aggregate data on different topics can not be produced independently from each other but as integrated parts of a comprehensive information system. In this case statistical concepts and infrastructures are shared, and the data in a common statistical domain are stored once for multiple purposes. Processing flows are built up around input variables or groups of input variables to feed a variable based warehouse where generating and publishing cube flows. This corresponds to a system able to manage several kinds of data (micro, macro and meta) from different phases of a statistical production process. WP3.2 – “Modular Workflow of the S-DWH” Integrated Warehouse model WP3.2 – “Modular Workflow of the S-DWH” Integrated Warehouse model WP3.2 – “Modular Workflow of the S-DWH” The S-DWH provide the possibility to use different platforms and software in separate layers and re-use components already available in-house or internationally and the usability of different software inside the same layer. A possible example of software component re-use is the CORE (COmmon Reference Environment) project, in which specific components are used for moving data between S-DWH layers or inside a layer. WP3.3 – “Functional Architecture of the S-DWH” The function architecture must be based on business needs & contest, to support generic statistical production process we have identified 14 management processes. To describe the main high level functionalities of the S-DWH from users' viewpoints we have introduced a Functional Architecture diagram. A functional diagram describes the architecture of a software product from a usage perspective. The functional diagram contains modules that represent the basic functions of a software product. This have been described by combining in a FA diagram the Generic Statistical Information Model (GSIM), using the Generic Statistical Business Process Model (GSBPM) convention when needed. WP3.3 – “Functional Architecture of the S-DWH” The GSIM is a reference framework of internationally agreed definitions, attributes and relationships that describe the pieces of information that are used in the production of official statistics (information objects). The Business Group (blue) is used to describe the designs and plans of Statistical Programs The Production Group (red) is used to describe each step in the statistical process, with a particular focus on describing the inputs and outputs of these steps The Concepts Group (green) contains sets of information objects that describe and define the terms used when talking about real-world phenomena that the statistics measure in their practical implementation (population frame and units) The Structures Group (orange) contains sets of information objects that describe and define the terms used in relation to data and their structure WP3.4 – “Overview of various technical aspects” This deliverable identify the Technology Architecture, this is intended as an overview of software packages existing on the market or developed on request in NSIs in order to describe the solutions that would meet NSI needs. It have been found a big variety of tools used in statistical production which are separated into several main groups. Generally in access, interpretation and source layers we use standardized tools which are out-of-box and are not highly customizable in a sense of adaptation to statistical processes. In integration layer where all operational activities needed for all statistical elaboration processes are carried out mainly by self-developed software is. In such case sharing of experience between NSIs in very desirable. Additionally, using common models and approaches ensure that unnecessary preparatory work will be avoided and applications will be developed using the same principles and good practices that are common for all NSIs and reflects the same main trends and procedures. WP3.5 – S-DWH: “Relate the 'ideal' architectural scheme into an actual development and implementation strategy” Business Architecture Layers Functionalities and Management processes Bjorn Berglund, SE Information Systems Architecture S-DWH is a metadata-driven system Layered approach of a full active S-DWH Pedro Cunha, PT