DWH-SGA2-WP3 - Presentation WS Amsterdam

advertisement
“Overview WP3: Building and implementation of the SDWH: Architectural and technical aspects”
Antonio Laureti Palma
Head of Unit: Implementation of the new management and
development model of IT functions
Italy, ISTAT- National Accounts and Economic Statistics Department
FINAL Workshop ESS NET ON: “MICRO DATA LINKING AND DATA
WAREHOUSING IN STATISTICAL PRODUCTION”
Amsterdam 25 & 26 SEPTEMBER 2013
WP 3 - Building and implementation of the S-DWH:
Architectural and technical aspects
 The goal of the statistical data warehouse is to enable NSIs to produce
flexible outputs with a maximum re-use of data that is already in the
statistical system.
 This work package focused on all essential architectural and technical
elements for developing a new model of data re-use, i.e. a new model of
statistical production based on a Data Warehouse architecture, which we
define as Statistical-Data Warehouse (S-DWH).
 The countries involved in the third WP are: IT, PT, LT, EE, SE, UK
 To ensure an efficient way of working and clear communication
between WP members, specific WP meetings were held.
WP 3 - Building and implementation of the S-DWH:
Architectural and technical aspects
 Our aims were to:
 Provide Business Architecture.
 Define an architectural framework for the S-DWH for business
statistics.
 Produce functional architecture of the S-DWH.
 Study management of the information flow between data sources
and S-DWH central administration.
 Produce a modular workflow approach.
 Identify an 'ideal' architectural design for an actual implementation
and development strategy.
 Study various technical aspects regarding the S-DWH (performance
issues, design issues, etc.).
WP 3 - Building and implementation of the S-DWH:
Architectural and technical aspects
The following deliverables were produced:
•
WP3.1 – “Business Architecture of the S-DWH”
(IT, PT) Antonio Laureti Palma, Sonia Quaresma
•
WP3.2 – “Modular Workflow of the S-DWH”
(EE, LT) Allan Randlepp, VZ
•
WP3.3 – “Functional Architecture of the S-DWH”
(SE, IT) Bjorn Berglund, ALP
•
WP3.4 – “Overview of various technical aspects”
(LT) Valerij Zavoronok
•
WP3.5 – “Relate the 'ideal' architectural scheme into an actual
development and implementation strategy”
(PT, IT, LT, EE, SE, UK) Pedro Cunha, AL P, AR, VZ, BB, Colin Bower
WP 3 - Building and implementation of the S-DWH:
Architectural and technical aspects
Stat Information system view :
WP 3 - Building and implementation of the S-DWH:
Architectural and technical aspects
 In the context of the Ess-Net, we identify this system overlapping
as the effective DWH in which we store statistical information of
several statistical domains for supporting any analysis for strategic
NSI’s or European decisions related to statistics.
 This identifies a new possible approach to statistical production, in
which data from different production departments are integrate and
store for several use.
WP 3 - Building and implementation of the S-DWH:
Architectural and technical aspects
 The general architecture is characterized by four functional layers,
starting from the bottom up to the top of the architectural pile, they are
defined as:
IV° - access layer, for the final presentation, dissemination and
delivery of the information sought specialized for external users;
III° - interpretation and data analysis layer, enables any data
analysis or data mining, functional to support statistical design or any
new strategies, as well as data re-use;
II° - integration layer, is where all operational activities needed for
any statistical production process are carried out; in this layer data are
manly transformed from raw to cleaned data and this activities are
carried on by internal operators;
I° - source layer, is the level in which we locate all the activities
related to storing and managing internal or external data sources.
WP 3 - Building and implementation of the S-DWH:
Architectural and technical aspects
Data Flows, Layered view:
strategic decisions
Regular dissemination of statistical information
IV° - ACCESS LAYER
III° - INTERPRETATION
AND ANALYSIS LAYER
II° - INTEGRATION LAYER
(re)use data
produce the necessary information
I° - SOURCES LAYER
WP3.1 – “Business Architecture of the S-DWH”
 In the deliverables, indications for a S-DWH as a common IT
infrastructure for statistical production have beenpresented. These have
been based on three IT domains (WP3.5):
 Business Architecture - used to align strategic objectives and
tactical demands, is sub-divided into:
 business processes (WP3.1), used to create the primary value
stream,
 management processes, that govern the operation of a system
(WP3.3);
 Information Systems Architecture - the conceptual organization of
the effective S-DWH which is able to support tactical demands
(WP3.2).
 Technology Architecture - the combined set of software, hardware
and networks able to develop and support IT services (WP3.2,
WP3.4).
WP3.1 – “Business Architecture of the S-DWH”
 Views of the BA:
Statisticians: There are typically only a handful of sophisticated
analysts—Statisticians and operations research types—in any organization.
They are some of the best users of the data warehouse; those whose work
can contribute to deeply influence the operations and profitability of the
company.
Knowledge Workers: Usually a relatively small number of analysts
perform the bulk of new queries and analyses against the data warehouse.
These are the users who get the "Designer" or "Analyst" versions of user
access tools. After a few iterations, those queries and reports typically get
published for the benefit of the Information Consumers.
Information Consumers: Characteristically most users of the data
warehouse are Information Consumers; they will probably never compose
a true ad hoc query. They use static or simple interactive reports that others
have developed.
Executives: Executives are a special case of the Information Consumers
group. Few executives actually issue their own queries, but an executive's
slightest musing can generate a flurry of activity among the other types of
users.
WP3.1 – “Business Architecture of the S-DWH”
 As start point, we described and analyzed some NSI business
processes using a simplified Business Process Model Notation
(BPMN) by adopting the Generic Statistical Business Process
Model (GSBPM) taxonomy.
 We have applied a top-down analysis, starting from specific
statistical output and then down to the sources of the following
European regulated output:
 Structural Business Statistics (SBS),
 Short Term Statistics (STS),
 Production Communautaire (Prodcom),
 Trade Statistics,
WP3.1 – “Business Architecture of the S-DWH”
We have synthesized the results by graphically mapping operational processes
on S-DWH functional layer architecture:
WP3.2 – “Modular Workflow of the S-DWH”
 In the deliverable WP3.2 the integration model and warehouse
approach were put together, independently from the statistical
methodological strategy.
 The S-DWH environment have been used to manage changes of
statistical production process, through the Interpretation Layer able to
refine the S-DWH-self.
 The integration have been seen from three main viewpoints:
 Technical integration – integrating IT platforms and software
tools;
 Process integration – integrating statistical processes like
survey design, sample selection, data processing, etc;
 Data integration – data are stored once, but used for multiple
purposes;
when we put all these three integration aspects together, we get a SDWH, which is built on integrated technology.
WP3.2 – “Modular Workflow of the S-DWH”
 It have emphasized that different aggregate data on different
topics can not be produced independently from each other but as
integrated parts of a comprehensive information system.
 In this case statistical concepts and infrastructures are shared,
and the data in a common statistical domain are stored once for
multiple purposes.
 Processing flows are built up around input variables or groups of
input variables to feed a variable based warehouse where
generating and publishing cube flows.
 This corresponds to a system able to manage several kinds of
data (micro, macro and meta) from different phases of a statistical
production process.
WP3.2 – “Modular Workflow of the S-DWH”
Integrated Warehouse model
WP3.2 – “Modular Workflow of the S-DWH”
Integrated Warehouse model
WP3.2 – “Modular Workflow of the S-DWH”
 The S-DWH provide the possibility to use different platforms
and software in separate layers and re-use components already
available in-house or internationally and the usability of different
software inside the same layer.
 A possible example of software component re-use is the CORE
(COmmon Reference Environment) project, in which specific
components are used for moving data between S-DWH layers or
inside a layer.
WP3.3 – “Functional Architecture of the S-DWH”
 The function architecture must be based on business needs &
contest, to support generic statistical production process we have
identified 14 management processes.
 To describe the main high level functionalities of the S-DWH from
users' viewpoints we have introduced a Functional Architecture
diagram.
 A functional diagram describes the architecture of a software
product from a usage perspective. The functional diagram contains
modules that represent the basic functions of a software product.
 This have been described by combining in a FA diagram the
Generic Statistical Information Model (GSIM), using the Generic
Statistical Business Process Model (GSBPM) convention when
needed.
WP3.3 – “Functional Architecture of the S-DWH”
 The GSIM is a reference framework of internationally agreed
definitions, attributes and relationships that describe the pieces of
information that are used in the production of official statistics
(information objects).
 The Business Group (blue) is used to describe the designs and plans
of Statistical Programs
 The Production Group (red) is used to describe each step in the
statistical process, with a particular focus on describing the inputs and
outputs of these steps
 The Concepts Group (green) contains sets of information objects that
describe and define the terms used when talking about real-world
phenomena that the statistics measure in their practical implementation
(population frame and units)
 The Structures Group (orange) contains sets of information objects
that describe and define the terms used in relation to data and their
structure
WP3.4 – “Overview of various technical aspects”
 This deliverable identify the Technology Architecture, this is intended as an
overview of software packages existing on the market or developed on
request in NSIs in order to describe the solutions that would meet NSI needs.
 It have been found a big variety of tools used in statistical production which
are separated into several main groups.
 Generally in access, interpretation and source layers we use standardized
tools which are out-of-box and are not highly customizable in a sense of
adaptation to statistical processes.
 In integration layer where all operational activities needed for all statistical
elaboration processes are carried out mainly by self-developed software is. In
such case sharing of experience between NSIs in very desirable.
 Additionally, using common models and approaches ensure that
unnecessary preparatory work will be avoided and applications will be
developed using the same principles and good practices that are common for
all NSIs and reflects the same main trends and procedures.
WP3.5 – S-DWH: “Relate the 'ideal' architectural
scheme into an actual development and
implementation strategy”
 Business Architecture
Layers Functionalities and Management processes
Bjorn Berglund, SE
 Information Systems Architecture
 S-DWH is a metadata-driven system
Layered approach of a full active S-DWH
Pedro Cunha, PT
Download