CoC-DWH - WP2 - 2.3 Questionnaire on tools in layers

advertisement
CENTRE OF COMPETENCE
ON DATA WAREHOUSING
QUESTIONNAIRE ON THE USE OF
SOFTWARE TOOLS
ACCORDING TO THE LAYERED
S-DWH ARCHITECTURE
Dear Colleagues,
Some of the objectives of the Centre of Competence – Data Warehousing (CoC-DWH) are:

Contacting ESS members for identifying and prioritizing relevant projects and
support requests included clear arrangements about support to be delivered by the
CoC-DWH.

Setting up the knowledge repository in the ESSnet DWH domain of the CROS portal
and implement a plan for active maintenance, including S-DWH best practice cases
in ESS member states.

Sharing experiences on the general aspects of running a Centre of Competence and
getting feedback for further evolution of this CoC concept.
As part of this process, we would like to invite you to fill in this questionnaire. The main goal
of the questionnaire is to better understand your current situation and needs, and to
provide the benefit of experience and best practice to other countries.
We would be grateful if you could complete questionnaire by 29 August 2014 if possible.
Please send your answers to Valerij.Zavoronok@stat.gov.lt (with CC to
hct.goossens@cbs.nl).
Best regards,
the CoC-DWH project team
2
National Statistical Institute:
_____________________________________________
First name / Surname:
_____________________________________________
Position / department:
_____________________________________________
Email:
_____________________________________________
1.
Background
In an S-DWH system we identified four functional layers, starting from the most detailed
bottom level up to the top of the S-DWH architecture where conceptual level is placed. The
ground level corresponds to the area where the process starts, while the top of the pile is
where the data warehousing process finishes. There are:
IV – access layer, for the final presentation, dissemination and delivery of the
information sought specialized for external, relatively to NSI or Eurostat, users;
III – interpretation and data analysis layer, enables any data analysis or data mining,
functional to support statistical design or any new strategies, as well as data re-use;
functionality and data are optimized then for internal users, specifically for
statistician methodologists or statistician experts.
II – integration layer, is where all operational activities needed for any statistical
production process are carried out; in this layer data are manly transformed from raw
to cleaned data and this activities are carried on by internal operators;
I – source layer, is the level in which we locate all the activities related to storing and
managing internal or external data sources.
The layered architecture reflects a conceptual organization in which we will consider the first
two levels as pure statistical operational infrastructures, functional for acquiring, storing,
editing and validating data and the last two layers as the effective data warehouse, i.e. levels
in which data are accessible for data analysis. This is generally composed of two main
different functional environments: an operational where all available information is collect
and build-up, usually defined as Extraction, Transformation and Loading (ETL) environment,
while an analytical part is the actual data warehouse, i.e. where data analysis, or mining,
reports for executives and statistical deliverables are realized.
3
The lower layers correspond to the area where the process starts, while the top layers
represent where the data warehousing processes are carried out. This reflects a conceptual
organization in which we consider the first two layers as operational IT infrastructures and
the last two layers as the effective data warehouse.
Can the functional scheme described above be applied in your NSI ?
What are the differences ?
4
2.
Access layer
The principal purpose of data warehouse is to provide information to its users for strategic
decision-making. These users interact with the warehouse throughout Access layer using end
user access tools.
 Specialised Business Intelligence (BI) Tools for data access
BI tools are a type of software that is designed to retrieve, analyse and report data. This
broad definition includes everything from Reporting and Query Tools, Application
Development Tools to Visual Analytics Software, Navigational Tools (OLAP viewers).
Name
Licence type
(Free /
proprietary)
Version /
Year of
release
Are you satisfied with this product?
[ ] Oracle
[ ] Microsoft
[ ] SAS Institute
[ ] SAP
[ ] Tableau
[ ] IBM Cognos
[ ] QlikView
[ ] Other brand.
Please name it:
____________
____________
[ ] Not using
2.1 Do you plan to change / update currently using software in the next
three years ? What are the reasons to do so ?
5
 Office Automation Tools for regular productivity and collaboration instruments
By Office automation tools we understand all software programs which make it possible to
meet office needs. In particular, an office suite therefore usually contains following software
programs: word processing, a spreadsheet, a presentation tool, a database, a scheduler.
Name
Licence type
(Free /
proprietary)
Version /
Year of
release
Are you satisfied with this product?
[ ] Microsoft Office
[ ] Corel WordPerfect
[ ] iWork
[ ] IBM‘s Lotus
SmartSuite
[ ] OpenOffice
[ ] Other brand.
Please name it:
____________
____________
[ ] Not using
2.2 Do you plan to change / update currently using software in the next
three years ? What are the reasons to do so ?
6
 Graphics and Publishing tools
Graphics and publishing tools are tools with ability to create one or more infographics from a
provided data set or to visualize information. There are a vast variety of tools and software
to create any kind of information graphics, depending on the organization’s needs.
Name
Licence type
(Free /
proprietary)
Version /
Year of
release
Are you satisfied with this product?
[ ] PSPP
[ ] SAS
[ ] SPSS
[ ] Stata
[ ] Statistical Lab
[ ] STATISTICA
[ ] Other brand.
Please name it:
____________
____________
[ ] Not using
2.3 Do you plan to change / update currently using software in the next
three years ? What are the reasons to do so ?
7
 Web services tools
Web services tools are used for creating and testing interaction between machines (M2M).
Name
Licence type
(Free /
proprietary)
Version /
Year of
release
Are you satisfied with this product?
[ ] Stylus Studio
[ ] Microsoft Visual
Studio
[ ] Apache Axis
[ ] SoapUI
[ ] JBoss
[ ] Other brand.
Please name it:
____________
____________
[ ] Not using
2.4 Do you plan to change / update currently using software in the next
three years ? What are the reasons to do so ?
3.
Interpretation and Data Analysis layer
The interpretation and data analysis layer is intended for statisticians and data scientists,
and would enable any data manipulation or unstructured activities. In this layer expert users
can carry out data mining or design new statistical strategies.
8
 Statistical Data Mining Tools
The overall goal of the data mining tools is to extract information from a data set and
transform it into an understandable structure for further use. Aside from the main goal of
the data mining tools they should also be capable to visualise data, which was extracted in
data mining process. Because of this feature a lot of tools from this category have been
already covered in the Graphics and Publishing tools section and are used in the Access
layer.
Name
Licence type Version /
(Free /
Year of
proprietary) release
Are you satisfied with this product?
[ ] IBM SPSS Modeler
[ ] SAS Enterprise Miner
[ ] STATISTICA Data Miner
[ ] Microsoft Analysis
Services
[ ] Oracle Data Mining
[ ] R
[ ] Other brand.
Please name it:
____________
____________
[ ] Not using
3.1 Do you plan to change / update currently using software in the next
three years ? What are the reasons to do so ?
9
 Business Intelligence Tools for data analysis in a direct connection with database.
Business Intelligence tools which allow users to create visual reports or dashboards and
other summaries of specific sets of data for trending and other data analysis needs.
Name
Licence type
(Free /
proprietary)
Version /
Year of
release
Are you satisfied with this product?
[ ] Eclipse BIRT Project
[ ] JasperReports
[ ] OpenOffice Base
[ ] Oracle Reports
[ ] SAS Web Report
Studio
[ ] SQL Server
Reporting Services
[ ] Crystal Reports
[ ] Zoho Reports
[ ] Other brand.
Please name it:
____________
____________
[ ] Not using
3.2 Do you plan to change / update currently using software in the next
three years ? What are the reasons to do so ?
10
 Tools for designing OLAP cubes
Name
Licence type
(Free /
proprietary)
Version /
Year of
release
Are you satisfied with this product?
[ ] SAS OLAP Cube
Studio
[ ] SQL Server Analysis
Services
[ ] Analytic Workspace
Manager 11g
]
Pentaho Schema
Workbench
[ ] Other brand.
Please name it:
____________
____________
[ ] Not using
3.3 Do you plan to change / update currently using software in the next
three years ? What are the reasons to do so ?
11
4.
Integration layer
The integration layer is where all operational activities needed for all statistical elaboration
processes are carried out. This means operations carried out automatically or manually by
operators to produce statistical information in an IT infrastructure. With this aim, different
subprocesses are predefined and preconfigured by statisticians as a consequence of the
statistical survey design in order to support the operational activities.
For the Integration layer there are mostly dedicated software applications usually defined as
Data Integration tools. This kind of software is used for metadata management and usually is
developed and implemented on NSI request. This is because of specific needs and
requirements from customer. It has a user friendly graphic interface to help the integration
of different input sources and their manipulation.
 Italy
Italy (Istat) has self-implemented system SIQual as metadata system. This is an information
system on quality, contains information on the execution of Istat primary surveys and
secondary studies and on activities developed to guarantee quality of the produced
statistical information metadata managing system developed solution. Also this is a tool for
generating quality reports.
Istat don't have a metadata managing system for operational activities yet.
 Lithuania
Statistics Lithuania don‘t use a single, centralized metadata management system yet. Most
of the systems have been developed independently of each other. Any kind of metadata can
be found in most of our systems. This is the reason why some of metadata are stored as
different copies in different systems.
Metadata related to quality of statistical data (such as relevance, accuracy, timeliness,
punctuality, accessibility, clarity, coherence and comparability), statistical method
descriptions are stored as free text using MS Office tools.
Almost all of metadata used to analyse and process statistical data of business surveys is
stored in Oracle DB with the much of the results processing being carried out in SAS. Only
one business survey is carried out in FoxPro, while all the statistical data and metadata of
social surveys is stored in MS SQL server.
Statistics Lithuania also uses several other software systems, which have some basic
metadata storage and management capability, in order to fulfil basic everyday needs.
 United Kingdom
United Kingdom’s Office for National Statistics (ONS) doesn’t have a single, centralised
metadata management system. The operational metadata systems are developed and
supported on a variety of technology platforms:
 Most business survey systems (including the business register) are run on Ingres
DBMS with the much of the results processing being carried out in SAS.
 Many of the older social surveys are using VB and SQL Server technology.
 Most new developments are carried out using Oracle / Java / SAS technology stack.
 Older systems supporting Life Events applications (births, marriages, deaths etc.) are
still maintained on Model 204 database.
12
As a result, each system or process supported by each of these technology implementations
have their own metadata, which are managed using the specific applications developed for
the statistical system storage, along with the data itself.
4.1 Please shortly describe software using for metadata handling and
processing in your NSI
4.2 Would you be willing to share your software with other ESS NSIs,
either the design and/or the tool ?
DESIGN:
TOOL:
5.
Source Layer
The Source Layer is where we locate all the activities related to storing and managing
internal or external data sources. Internal data are from direct data capturing carried out by
CAWI, CAPI or CATI while external data are from administrative archives, for example from
Customs Agencies, Revenue Agencies, Chambers of Commerce, Social Security Institutes.
Generally, data from direct surveys are well-structured so they can flow directly into the
integration layer. This is because NSIs have full control of their own applications. Differently,
data from others institutions’ archives must come into the S-DWH with their metadata in
order to be read correctly.
13
Name
[ ]
IBM Infosphere
DataStage
[ ]
Informatica
PowerCenter
[ ]
Oracle Warehouse
Builder
[ ]
SAS Data
Integration Studio
[ ]
SAP Business
Objects Data
Services
[ ]
Microsoft SQL
Server Integration
Services
[ ]
Pentaho Data
Integration (Kettle)
[ ]
CloverETL
[ ]
JasperETL
[ ]
Talend Data
Integration
[ ]
Other brand.
Licence type
(Free /
proprietary)
Version /
Year of
release
Are you satisfied with this product?
Please name it:
____________
____________
[ ]
Not using
14
5.1 Do you plan to change / update currently using software in the next
three years ? What are the reasons to do so ?
5.2 Would your NSI be willing to share software with other ESS members,
either the design and/or the tool(s) ?
If not, is there a particular reason ?
5.3 Is there something else you would like to mention/share ?
Thank You !!!
15
Download