of metadata

advertisement
in partnership with
Title:
Definition of the functionalities of a metadata system to facilitate
and support the operation of the S-DWH
WP:
1
Deliverable:
1.4
Version:
2.0 (Final)
Date:
30-9-2013
NSI:
SE,
SC,
ONS
ISTAT
Authors: Maia Ennok, Kaia Kulla
Lars Goran Lundell (3.1),
Colin Bowler (3.3),
Viviana De Giorgi (3.4),
ESS - NET
ON MICRO DATA LINKING AND DATA WAREHOUSING
IN PRODUCTION OF BUSINESS STATISTICS
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
INDEX
1.
2.
3.
Introduction
1.1 Target audience
6
1.2 What is in this document?
6
Functionalities of a metadata system
7
2.1
Metadata creation
8
2.2
Metadata usage
8
2.3
Metadata maintenance
9
2.4
Metadata evaluation
9
Metadata functionalities by layers
3.1
Source layer
11
11
3.1.1 Metadata creation
11
3.1.2 Metadata usage
12
3.1.3 Metadata maintenance
12
3.1.4 Metadata evaluation
12
3.2
Integration layer
13
3.2.1 Metadata creation
13
3.2.2 Metadata usage
15
3.2.3 Metadata maintenance
16
3.2.4 Metadata evaluation
17
3.3
Interpretation and data analysis layer
3.3.1 Metadata creation
1
4
17
17
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
3.3.2 Metadata usage
18
3.3.3 Metadata maintenance
18
3.3.4 Metadata evaluation
19
3.4
Data access layer
19
3.4.1 Metadata creation
19
3.4.2 Metadata usage
19
3.4.3 Metadata maintenance
20
3.4.4 Metadata evaluation
21
3.4.5 Describing functionalities: an example
22
Table 1 Metadata functionalities
23
4. Functionalities of a metadata system based on running integrated metadata
information system (iMETA) of Statistics Estonia
24
4.1 Metadata creation
4.1.1 Manual and automated creation
24
4.1.2 Metadata repository/managing meta models
24
4.1.3 Harvesting
25
4.1.4 Data access authorization metadata creation
25
4.2 Metadata usage
25
4.2.1 Users
25
4.2.2 Search and navigation
26
4.2.3 Metadata export (output generation)
26
4.2.4 Other systems and internationality
26
4.3
2
24
Metadata maintenance
27
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
4.3.1 Maintenance of metadata objects (insert, update, delete, versioning)
27
4.3.2 Users and rights (of metadata) management
28
4.3.3 User guides
28
4.4
Metadata evaluation
4.4.1 Use of standards
4.5
Metadata by metadata functionality groups and layers
Table 2 Metadata by metadata functionality groups and layers
5. Conclusion
3
28
29
30
30
37
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
1.
Introduction
In an efficient metadata architecture the tools used in the data warehousing
implementation (ETL, data modelling, rules, etc.) produce metadata in the warehouse
life cycle in a manner and format that allows it to be easily referenced and integrated
with the integrated metadata system.
A metadata solution should identify and prioritize metadata functions that are
important for statisticians and IT users’ understanding, navigation and acceptance of
the system.
The metadata lifecycle (described in Metadata framework for Statistical Data
Warehousing1) as divided into the following three basic phases:
1. Collection
Metadata should be captured as early as possible in the production process.
Collection of some types of metadata can and should be automated. When data is
entered to the data warehouse basic metadata must already exist in a correct
form. Collection of metadata should be automated whenever possible.
2. Maintenance
Metadata must be up to date always. Processes must be in place to capture
changes, synchronize metadata with the changing architecture.
3. Deployment
Metadata must be available to users in the right form and with the right tools.
Different user categories need different metadata and have different requirements.
End users want to use metadata to easily and correctly find and interpret the data
they need. Data stewards want an inventory of what is stored in the data warehouse.
Analysts want to compare the data sources. Programmers want to make sure that
they use the standard names.
In order to meet these diverse needs of different users of the (meta)data, the
statistical metadata must be managed and maintained in the metadata system that
has the specific requirements.
1
Lundell L.G. (2012) Metadata Framework for Statistical Data Warehousing, ver. 1.0. Deliverable 1.1
4
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
In this document we focus on the functionalities of metadata system to facilitate and
support the operation of the S-DWH.
Metadata system can be evaluated by metadata exchange, usability, administration,
and tool reliability.
The components of a metadata system can be categorized into different categories:

creating metadata,

automatic extraction and production,

conversion between metadata formats,

subject description,

encoding, structure, and syntax,

exchange/transfer of metadata,

harvesting, indexing, search, and browse of metadata databases,

metadata repositories and metadata storage,

metadata display,

integrated environments.
According to Common Metadata Framework2 the statistical metadata system (SMS)
should be a tool enabling a statistical organization to perform effectively the following
functions (that we have found also relevant to S-DWH):
Planning, designing, implementing and evaluating statistical production (S-DWH)
processes.
Managing, unifying and standardizing workflows and processes.
Documenting data collection, storage, evaluation and dissemination.
Managing methodological activities, standardizing and documenting concept
definitions and classifications.
Managing communication with end-users of statistical outputs and gathering of
user feedback.
Improving the quality of statistical data and transparency of methodologies. The
SMS should offer a relevant set of metadata for all criteria of statistical data
quality.
Managing statistical data sources and cooperation with respondents.
Improving discovery and exchange of data between the statistical organization
and its users.
Common Metadata Framework Part A, page 8
(http://www1.unece.org/stat/platform/display/metis/The+Common+Metadata+Framework)
2
5
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Improving integration of statistical information systems with other national
information systems.
Disseminating statistical information to end users. End users need reliable
metadata for searching, navigation, and interpretation of data.
Improving integration between national and international organizations.
International organizations are increasingly requiring integration of their own
metadata with metadata of national statistical organizations in order to make
statistical information more comparable and compatible, and to monitor the use of
agreed standards.
Developing a knowledge base on the processes of statistical information
systems, to share knowledge among staff and to minimize the risks related to
knowledge loss when staff leave or change functions.
Improving administration of statistical information systems, including
administration of responsibilities, compliance with legislation, performance and
user satisfaction.
1.1 Target audience
The aim of this document is to help statistical organizations improve the effectiveness
of metadata of S-DWH across all the layers of S-DWH and all phases of the
statistical business process. It is intended as a tool to assist S-DWH experts/users
(managers, designers, subject-matter specialists, methodologists, information
technology experts, researchers etc.) to develop business cases for a new or
enhanced SMS for S-DWH.
1.2 What is in this document?
In this document, we focus on the functionalities of metadata system to facilitate and
support the operation of the S-DWH:
6

Functionalities by functionality groups (metadata creation, metadata
usage, metadata maintenance, metadata evaluation);

Metadata functionalities by layers (source layers, integration layers,
interpretation and data analysis layer, data access layer);

Case study of metadata functionalities (integrated metadata system
(iMETA) of Statistics Estonia).
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
2.
Functionalities of a metadata system
Metadata system main functions are to gather and store metadata in one place, give
an overview of metadata (queries, searches etc.), create and maintain metadata,
evaluate metadata, managing access grant by role-based security.
Core requirements of metadata system are record creation, modification, deletion,
multi-value attributes, select-list menu, simple and advanced search, simple display,
import and export using XML document (CSV), links to other databases, cataloguing
history, and authorization management.
Metadata system has to satisfy following requirements:

provide different levels of information granularity,

convert legacy systems and records into new ones,

equip customized options for report generation,

incorporate miscellaneous tools, in terms of metadata creation,
retrieval, display,

implement structured relations for existing metadata standards,

enable multi-lingual processing (inc. Unicode character sets),

a built-in process for managing the workflow evaluation of metadata,

a role-based security system controlling access to all features of the
system.
In Common Metadata Framework3 a model for managing the phases of an SMS
development life cycle is presented. SMS management has following phases: design,
implementation, maintenance, use and evaluation. In this document we do not
describe management of phases but phases themselves.
Consider all above we can specify following metadata functionality groups for
metadata system of S-DWH:
3

metadata creation;

metadata usage;

metadata maintenance;

metadata evaluation.
Common Metadata Framework Part A, page 26
7
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Metadata management consists of creation of metadata, usage of metadata,
maintenance of metadata and evaluation of metadata.
Metadata management will be more deeply covered in Recommendations and
guidelines on the governance of metadata management in the S-DWH
(deliverable 1.5)4.
Metadata management also includes user training and composing user guide of
metadata system.
2.1
Metadata creation
Metadata creation is implementing metadata to the metadata system by creating
metadata or collecting metadata.
In metadata creation metadata objects, their definitions, links between metadata
objects and processes and metadata repository are created by searching, retrieval,
exporting and downloading metadata.
List of metadata functionalities:

manual creation;

automated creation;

harvesting from other systems:
o automated extraction(automated, regular process of collection metadata
descriptions from different sources to create useful aggregations of
metadata and related services);
o convert metadata;
o manual import from files (XML, CSV);
2.2

data access authorization metadata creation;

implementing metadata repository;

creating links between metadata objects.
Metadata usage
By Metadata Framework users of S-DWH metadata can be both humans
(statisticians, IT specialists, end-users etc.) and machines (other systems).
4
Di Giorgi. V, Lindelauf. M (2013) Recommendations and guidelines on the governance of
metadata management in the S-DWH. Deliverable 1.5
8
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
All the metadata must be deployed meaning metadata must be available to users in
the right form and with the right tools.
Metadata is presented to other systems by metadata system that is integrated with
other systems and S-DWH components;
List of metadata functionalities:
2.3

search;

navigation;

metadata export;

international use.
Metadata maintenance
The role of metadata maintenance is to ensure that all metadata stored in the
metadata repository are up-to-date for ongoing use.
List of metadata functionalities:
2.4

maintenance of metadata history (versioning, input, update, delete);

updating meta models in metadata repository;

updating links between metadata objects;

users and rights (of metadata) management.
Metadata evaluation
In metadata system there should be functionalities to evaluate the quality of
metadata and to assure that metadata has high quality.
Metadata quality requirements are set in Recommendations on the impact of
(meta)data quality in the S-DWH (deliverable 1.2).5
Quality evaluation processes are according to the indicators/requirements that must
be implemented.
5

List of metadata functionalities:

metadata validation (for example check value domains, check links
between metadata objects);
Bowler.C, Lindelauf. M, Dressen. J (2013) Recommendations on the Impact of Metadata Quality in the
Statistical Data Warehouse. Deliverable 1.2
9
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)

10
standards usage.
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
3.
Metadata functionalities by layers
Layers are defined in the S-DWH Business Architecture6 document (deliverable 3.1)
and metadata subsets by layers are defined in the Metadata framework7
(deliverable 1.1)
3.1
Source layer
The source layer is data’s entry point to the S-DWH. It is responsible for receiving
and storing the original data from the NSI’s internal or external sources and making
data available to the ETL functions that bring data to the integration layer.
3.1.1 Metadata creation
In an ideal situation all metadata necessary to forward data from the source layer to
the integration layer have either already been created by the external data suppliers
and are delivered to the S-DWH, or can be created automatically either in the source
layer or in the integration layer. In any case, a minimum requirement is that the
technical metadata that describe the incoming data are provided by the data
suppliers.
If the metadata created by the external sources are delivered in standardised
formats, such as DDI, SDMX, etc., the source layer should be able to create the
metadata needed in the S-DWH by extracting them and, if necessary, converting
them to the required formats automatically.
Creating metadata by manually adding them to the S-DWH metadata repository
should be a last resort, but will probably often be necessary to some degree. For
example, metadata that documents a questionnaire may be created automatically or
may need manual creation depending on what software has been used for the
questionnaire design. Creating metadata in a controlled manner requires the use of a
relevant editing tool, preferably a dedicated one. The tool must let the operator enter
the required codes, texts, links, etc., in standardised formats, and should also allow
the operator to immediately validate the entered values. It should contain copy/paste
functions to make the task as manageable as possible.
An example of when creation of metadata is necessary is when a value domain (code
list) is only available as hardcopy.
6
7
Laureti Palma A. (2012) S-DWH Business Architecture. Deliverable 3.1
Lundell L.G. (2012) Metadata Framework for Statistical Data Warehousing, ver. 1.0. Deliverable 1.1
11
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
3.1.2 Metadata usage
The source layer in itself uses relatively few metadata. It needs information on the
sources, such as:

responsibilities for data deliveries (who makes source data available to
the S-DWH, which access rights are needed, etc.), the methods to be
used (are data going to be delivered to the S-DWH – “pushed”,
physically collected by the S-DWH from some agreed location –
“pulled”, or directly accessed from the original location – “virtual
storage”),

if relevant and possible the expected frequencies (when will new source
data be available),

source data formats (record layout, storage type, location).
One of the main tasks of the source layer is to act as the warehouse’s gatekeeper,
the function that makes sure that all data entered into the S-DWH adhere to an
agreed set of rules (recommendations on metadata quality are described in
Recommendations on the Impact of Metadata Quality
in the Statistical Data Warehouse8). These rules are expressed as technical and
process metadata. This means that in order to accept a delivery of source data (“raw
data”) and allow them to be forwarded to the next layer, relevant and correct
metadata must be available, i.e. they must already exist or they be created.
3.1.3 Metadata maintenance
Some metadata are closely linked to one particular data delivery, e.g. the results of a
census, but others are valid for several deliveries, e.g. the same metadata refer to
several rounds of a survey. The first type may be delivered as part of the data
delivery, while the second type should be entered in advance, before the first data
delivery.
3.1.4 Metadata evaluation
Regardless of whether metadata are entered manually or created automatically they
must always be validated. New metadata should be compared with and checked
against already existing metadata and, if relevant, data to ascertain consistency
within the metadata repository, and between data and metadata.
8
Bowler. C (2013) Recommendations on the Impact of Metadata Quality in the Statistical Data
Warehouse.Deliverable 1.2
12
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
The source layer’s gatekeeper responsibility requires that all codes that appear in the
data must appear in the metadata as enumerated value domains. Since many of
these codes will be used as dimensions in the following layers it is vital that no values
are missing. A check that no mismatches exist must be carried out in the source
layer, and any found errors must be corrected by editing the metadata or the data. In
case metadata contain minimum and maximum values (e.g., a percentage value
must be within the range of 0-100) the corresponding data values should be checked,
and corrected when needed.
3.2
Integration layer
According to the S-DWH Business Architecture9 in the integration layer all clerical
operational activities typical of a statistical production process are carried out. This
means operations carried out, automatically or manually, by users to produce
statistical information in an IT infrastructure.
In this layer all the sub-processes of phase 5 and one sub-processes from 6 is
included.
We include also some sub-processes of phases 1, 2 and 3 of the GSBPM which are
relevant for metadata of S-DWH.
All classical ETL processes are covered in the integration layer of S-DWH.
3.2.1 Metadata creation
Most of statistical metadata is created manually, process metadata is created
manually and automatically, technical metadata is created mostly automatically,
same for quality metadata.
As much as possible standards are used for creating metadata of integration layer for
example for statistical metadata Neuchâtel is used, for quality metadata ESQRS is
used.
Metadata harvesting depends on how S-DWH is developed, for example in
integration layer process and technical metadata are usually created in S-DWH and
harvested by metadata system.
If metadata of integration layer is in other format, there should convert metadata to
suitable format for example transformation rules in collection systems are often
different format than needed in data processing.
13
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Data access metadata (authorisation metadata) is created for data warehouse (data
marts) and data staging areas.
In Integration Layer the following metadata for integration is created in 1-3 phases of
the GSBPM (Specify needs, Design, Build).

For preparing business case metadata of statistical activity is created
(manually by fill-in SMS according to standards ESMS, Neuchâtel etc.).

Designing outputs output variable metadata (algorithms) is created,
output validation metadata (algorithms) is created.

Designing variable descriptions variable metadata (quality metadata
according to ESQRS) is created, classifier metadata is created,
algorithms are created, new variable metadata (creating algorithms) is
created using available variables metadata.

Design frame and sample methodology frame, sample and stratum
metadata are created.

Design production systems and workflow data warehouse data model
metadata is created, coding algorithms are created using classifiers and
coding tables metadata, imputation algorithms are created using
imputation methods, dissemination metadata (publication calendar) is
created, algorithms of statistical confidentiality are created, data
processing algorithms (incl. aggregation) and scheduling metadata are
created, questionnaires design are created, data model of raw data are
created, raw data validation algorithms are created, pre-fill metadata is
created.

Configure workflows scheduling technical metadata is created.

Building data collection instrument data collection structure technical
metadata is created.

NB! If in following phases someone has new needs then someone has
to check previous phases for need to make changes in metadata.
Metadata of phase 5 – Process
9

For data integration process metadata (where data is from data source
metadata) is created, quality indicators for integration process are
created.

For classify & code process metadata is created.

For review, validate & edit quality variables (edit failure rate - ESQRS)
and process metadata are created.
Laureti Palma A. (2012) S-DWH Business Architecture. Deliverable 3.1
14
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)

For imputation; quality variables are created, quality variable metadata
(imputation rate - ESQRS) is created and process metadata (how, who,
when etc) is created.

For derive new variables and statistical units: creates process
metadata, creates quality variable metadata (new variables and new
statistical unit rate).

For weights calculation process metadata is automatically created.

For aggregate calculation process metadata is automatically created.

For data files finalization data finalizing metadata (date, person etc.) is
created, data loading to SDW metadata (logs, date, person etc.) is
created.
Metadata of phase 6 – Analyze

For draft output preparation output variable metadata (algorithms) is
created using output variables metadata, classifiers metadata is
created, tables titles (using output variable metadata) is created.
3.2.2 Metadata usage
Metadata users in integration layer are both humans and machines. In every process
of integration layer metadata should be navigable and searchable (example browsing
metadata of variables by statistical activities and domains).
All metadata objects in metadata system are related (example variable is related to
statistical activity and classifier).
Metadata is multilingual (English, local language), possible to share internationally via
unified services with standard format (like XML, SDMX).
S-DWH shares its metadata with other systems via metadata system. In S-DWH data
object has reference to metadata object (example by metadata object id) in metadata
system. Metadata of integration layer can be exported from metadata system. SDWH uses metadata from metadata system that retrieves metadata also from other
systems.
In Integration Layer metadata for integration is created in 1-3 phases of the GSBPM
uses following metadata:
15

For checking data availability metadata of available administrative data
is used.

For design production systems and workflow variable metadata is used,
tables/columns metadata are used, classifiers and coding tables
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
metadata is used, imputation methods design variable descriptions are
used, available variables metadata is used.

For configuring workflows scheduling statistical and process metadata
is used.
NB! If in following phases someone has new needs then someone has to check
previous phases for need to make changes in metadata.
Phase 5 – Process

For data integration pre-filling metadata is used, sample metadata is
used, data model of raw data is used, variable metadata is used, set-up
collection metadata is used.

For classify & code coding algorithms, classifiers, coding tables’
metadata is used.

For imputation algorithms of imputation are used.

For new variables and statistical units derivation new variable creating
algorithms is used.

For calculate weights stratum and frame metadata is used.

For calculate aggregate methods for aggregate is used, variable
metadata is used, classifiers metadata is used.

For finalize data files finalization data warehouse data model metadata
is used.
Phase 6 – Analyze

For draft output preparation output variables metadata is used,
classifiers metadata is used.
3.2.3 Metadata maintenance
The main functionalities of metadata maintenance at the integration layer level are:
16

Maintain (create, update, delete, versioning) integration metadata (data
processing algorithms, data warehouse data models etc.).

User rights are according to the S-DWH system operations of all
S-DWH processes in all layers (for example data transformation, data
loading) S-DWH has following operations: read metadata, create data
processing packages, access to delicate data, solve data processing
tasks, schedule packages, see logs etc.

Integration metadata can be stored in different meta models
(maintaining meta models). Meta model for statistical metadata,
process metadata etc.
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)

S-DWH uses the same metadata repository (in the metadata repository
there can be different metadata models for different metadata subsets).

All users of S-DWH can access to metadata (only for viewing
metadata), but for changing metadata there should be granted
privileges by S-DWH operations and statistical activities.
3.2.4 Metadata evaluation
By creating integration layer metadata (data processing algorithms), this metadata is
validated: controlling existing required values, data type controls, linking only existing
objects, data models are with comments. Metadata is validated according to usable
standards.
Some evaluation controls are built-in a SMS for metadata fill-in processes, some are
systematic a built-in processes for managing the workflow evaluation of metadata
that control following (validation queries), some are organizational processes.
3.3
Interpretation and data analysis layer
This layer is mainly aimed at ‘expert’ (i.e. statisticians/domain experts, and data
scientists) users for carrying out advanced analysis, and data manipulation and
interpretation functions, and access would be mainly interactive. The work in
generating the analysis is effectively a design of potential statistical outputs. This
layer might produce such things as data-marts, which would contain the results of the
analysis. In many cases, however, the results of an investigation into the data
required for a particular analysis may identify a shortfall in the availability of the
required information. This may trigger the identification of requirements for whole new
sets of variables, and methodologies around the processing of them.
The following are some examples of the metadata used in this layer, and the GSBPM
sub-processes where they might apply.
3.3.1 Metadata creation
17

Creation of a design for a new analysis or output definition [Note: the
analysis might not actually become an output – it could be experimental
at first] (process 2.1 – Design outputs).

Manual creation of variable definitions for the new analysis or output
(2.2 Design Variable Descriptions).

Creation of the methodology design for the statistical processing – e.g.
creation of specifications for imputation, validation etc. (process 2.5 –
Design statistical processing methodology).
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)

Manual creation of SQL Scripts (or other data manipulation
programming media) encompassing the data selection rules required to
carry out the identification of the data to be used in the analysis. This
might also include data matching and linking routines (active metadata).
(process 5.1 – Integrate Data).

Manual creation of a quality report (reference metadata) relating to the
new special analysis after it has been run (process 6.5 – Finalise
Outputs).

Manual creation of an Interpretation Document metadata in text form to
accompany any data sets. Also, creation of graphics/charts etc. to
accompany the data and assist the user to interpret the data (Passive
metadata – process 6.5 – Finalise Outputs).
3.3.2 Metadata usage

To carry out examination of the metadata available describing
administrative data sources, and methodologies, in order to evaluate
suitability for a new analysis or identified output. Also, the examination
of descriptions of existing variables to check existing data availability
and suitability for inclusion in the analysis or output (process 1.5 –
Check data availability).

View lists of variables, code lists, and classifications, and their
definitions, to determine the structural metadata items to be used as
search criteria to provide data for the analysis and data integration.
These variables would also serve as the essential dimensions of the
fact tables of the data marts (process 5.1 – Integrate Data).

Run the SQL Scripts (or algorithms encoded in other programming
media) to extract and integrate the data from different sources in the
data warehouse (process 5.1 – Integrate Data).

Utilise disclosure rule metadata in the disclosure checking process for
the intended output datasets being created by the run of the analysis
(process 6.4 – Apply disclosure control).

Utilise quality metadata as input to any interpretation documentation
accompanying output data sets (e.g. when assessing ‘measures of
uncertainty’ for the output). (process 6.5 – Finalize Outputs).
3.3.3 Metadata maintenance
18

Check appropriate rights exist in the S-DWH for the user who is
attempting to create a new analysis design (check that the user has
written access to the appropriate area of the metadata repository).
Similarly for read rights to access certain data groups.

Delete old or defunct analysis descriptions, and their associated SQL
Scripts, as part of a maintenance/archive function.
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
3.3.4 Metadata evaluation
3.4

Recording quality characteristics from the different elements of the
analysis during the preparation of a draft output. This might take the
form of quality indicator attributes attached to variables (process 6.1 –
Prepare Draft Outputs).

Following evaluation of the output as a whole, the statistical content
would need to have some approval status metadata attached or set.
(process 6.5 – Finalise Outputs).
Data access layer
The access layer is the fourth and last layer identified in a generic S-DWH; it is the
layer at the end of the process of an S-DWH that together with the interpretation layer
represents the operational IT infrastructure. The access layer is the layer for the final
presentation, dissemination and delivery of the information sought10.
3.4.1 Metadata creation
Actually, metadata creation about data of the S-DWH at the data access level is
merely an operation of converting/harvesting metadata already created in the other
layers, in order to be used for dissemination. What is needed at this level is the
procedure of harvesting metadata already provided.
The only creation of metadata regards the subsequent typologies:
1. metadata about data access, for example statistics about users access on data
and metadata, which data is the most requested, which year, which
disaggregation, etc.
2. users’ evaluation metadata, e.g. assessment of easiness of finding information,
Metadata about users and uses are created in an automated way. Users’ evaluation
metadata should be generated automatically.
3.4.2 Metadata usage
At the access level the main users of data/metadata are final users (researcher,
students, organizations, etc.), who want to know in general the meaning of data and
also the accuracy, the availability, and other important aspects of the quality of data.
This is in order to be able to correctly identify and retrieve potentially relevant
statistical data for a certain study/research/purpose, as well as for correctly interpret
and (re)use statistical data. Metadata concerning quality, contents and availability
10
Laureti Palma A. (2012) S-DWH Business Architecture. Deliverable 1.3.
19
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
aspects of data and processes is an important part of a feedback system, as well as
the users’ evaluation and users data access.
The main functionalities/uses/purposes of metadata at the data access layer level
are:

searching data/metadata, which consists of identifying the existence of
specific information;

locating data/metadata, which means tracing a specific occurrence of
data/metadata;

selecting and filtering data/metadata;

obtaining information on data/metadata availability, by querying directly
the data and metadata repositories;

obtaining feedback/evaluation from users, by working out statistics on
the most/less searched table, graph, series of data; using systems for
gathering marks to easiness, understandability, usefulness, glossary,
etc.; tracking help usage; giving out a questionnaire to a random
sample of users;

analysing, evaluating and assessing information, for examples users’
feedbacks by using statistics tools;

foreseeing accessibility and availability of data/metadata to other
systems or services managed by others, for example access from
external wikis and web sites;

semantically interoperating and cooperating, which means enabling
research to exchange information through a series of equivalences
among addressed information, to best facilitate the coding, transmission
and use of data, when content dependent metadata coming from the
other layers evolves according to changes;

including metadata for language variations, language limiters to filter
search results, abstracts in various languages; integrating multilingual
thesauri and international classification schemes; providing links to
translations, related databases and web sites, etc.
3.4.3 Metadata maintenance
The main functionalities/actions of metadata maintenance at the data access layer
level are:
20

creating, updating, deleting, reviewing metadata;

harmonising and exploiting data/metadata, for re-using it;

maximising utility and usability of data/metadata, improving and
promoting data/metadata re-use, e.g. by externalising metadata, which
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
means converting them into shared metadata, to serve for more objects
than only one (more tables, more graphs);

exporting metadata for using it in other systems converting metadata
from one standard format to another, by using a metadata translation
engine, which should be configured to support any metadata standard
or profile;

managing resources, i.e. managing data/metadata libraries through the
catalogues and descriptors;

managing authentications of data/metadata accesses;

managing metadata about users of data and controlled use of data, e.g.
describing how data can be accessed, when, by whom and with which
restrictions and constraints.
3.4.4 Metadata evaluation
As regards metadata, in order to ensure it is of good quality11, which means getting
an unambiguous and definite access to the data, at least the subsequent
functionalities/proprieties should be implemented:

using domain-independent metadata properties

ensuring valid values and structures, and appropriate default values

creating procedures to verify individual items

aggregating metadata for web searching, e.g. clustering it into a treegraph or organizing it so that “searchability” can be easier and more
standardised

using standardised and harmonised metadata formats for official
statistics and implementing specific components for other typologies of
dissemination

standardising remote data/metadata access

using automated/assisted procedures to update metadata as soon as it
is available

synchronising the dissemination of metadata with the dissemination of
the data to which it applies

implementing side services that require the integration of metadata and
the cooperation in the acquisition of new metadata

creating multilingual aids for users
The six dimension of the quality of data are: relevance, accuracy, timeliness, accessibility, interpretability, and
coherence. They can be applied to metadata as well.
11
21
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)

facilitating coordination between administrations in relation to the
possibility of common needs for using common resources, e.g. by
managing permissions for administrators at different layers in order to
understand each other’s concerns, using uniformity in programming
language, permission types, etc., giving possibility to make request
such as proposals, notifications, comparisons

taking into account the international and future vision of the S-DWH,
e.g. international visibility can be promoted whilst national interests are
served.
3.4.5 Describing functionalities: an example
Discussions exploring metadata in the information resource community (libraries,
archives, museums, and other information centres), tend to group metadata elements
by the various functions they support. The result is the identification of different types
of metadata (or metadata classes), each of which comprises multiple metadata
elements.12
The metadata framework for statistical data warehousing identifies metadata
categories and metadata subsets13. For the S-DWH business architecture the subphases of the GSBPM at the level of access data layer are defined.14
Here is an example (Table 1) of illustrating metadata functionalities15, by first
identifying the functions used at the level of the data access layer, the input needed
and the output obtained, and then the categories/subsets of metadata used. In the
last column we identify one of the correspondent sub-phases of the GSBPM such
functions imply.
Greenberg J. (2005). Understanding Metadata and Metadata Schemes. The Haworth Press, Inc.
Lundell L.G. (2012) Metadata Framework for Statistical Data Warehousing, ver. 1.0.
14
Laureti Palma A. (2012) S-DWH Business Architecture. Deliverable 3.1
15
The list is not exhaustive and regards only the data access layer.
12
13
22
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Table 1 Metadata functionalities
Data
Function name
Data/metadata
display
Easier/simple/adv
anced search
Navigator
On-line help
Dictionary
Download/Export
Login/logout
Users
management
(administrator)
Monitor
users
access
(administrator)
Status
change
(administrator)
23
Metadata
categories
Description
Allows to display and
search
the
data/metadata
by
setting
a
basic/minimum/detail
ed set of search
criteria
Allows to interact with
the metadata system
in order to search
data/metadata
of
interest, view and
understand them
Allows to download
the
data/metadata
about data/services
Allows
to
log
in/terminate
the
reserved session by
types of user
Allows to manage the
register on accredited
users/level of users
Allows to monitor
accesses
and
operations executed
by users
Allows to change the
“users’ status” as a
consequence
of
modified
characteristics
Metadata
subsets
GSBPM
phase
Input
Output
Search
parameters
List
of
metadata
(HTML or
other
formats)
Statistical
Process
Quality
-
Maps,
dictionary
items, list of
objects
-
-
Metadata
file
Statistical
Process
Quality
Active/passive
Formalised
Reference/struc
tural
7.3
Access
credential
(Possible)
error
messages
Authorisation
Active
Formalised
Structural
7.1
Alphanumer
ic
parameters
Information
on users
Authorisation
Technical
Active
Formalised
Structural
7.5
-
-
Authorisation
Technical
Active
Formalised
Structural
7.3
Specified
parameters
-
Authorisation
Technical
Active
Formalised
Structural
7.1
Active/passive
Formalised
Reference/struc
tural
Passive
Free form
7.4
7.4
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
4. Functionalities of a metadata system based on running
integrated metadata information system (iMETA) of
Statistics Estonia
In 2011 Statistics Estonia developed and took to production integrated metadata
system (iMETA), it has been used among others in the population and housing
census 2011 statistical data warehouse.
4.1 Metadata creation
4.1.1 Manual and automated creation
Metadata can be created either by automated information processing or by manual
work.
Metadata can be manually created in iMETA by filling forms or by importing metadata
from files (for example CSV files). Automated process can create metadata by
scanning database table structure with descriptions to metadata repository.
4.1.2 Metadata repository/managing meta models
Statistics Estonia has one metadata repository, where all metadata is stored. iMETA
is one of the application that uses this repository (also data processing system (VAIS)
and user role management application (URMA) create and maintain their metadata in
metadata repository). In iMETA user interface can define and maintain links between
different metadata in different meta models and between different sets of metadata.
Metadata system is based on the meta-meta model, that has different meta models
and some meta models are created and managed by different systems, but all
metadata are stored in this metadata repository.
Meta models in metadata repository:
24

Neuchâtel – terminology metadata (iMETA). Meta model for integrated
statistical metadata system. This meta model contains statistical
metadata like variables, classifiers, code lists etc.

RBAC – role-based access control meta model for separate application
of user and role management (URMA). You can view user and role
metadata in the metadata navigator. This meta model contains
authorisation metadata like roles, privileges by operations of S-DWH.

XDTL – extensible data transformation language meta model for SDWH ETL objects. This meta model contains process metadata like
ETL procedures (processing algorithms) etc.
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)

RDBMS – relational database management system meta model for
database metadata. Metadata of technical characteristics is created into
databases but are scanned into an integrated metadata system (manual
management in iMETA, automatic management by importing scanned
database elements structures). This meta model contains technical
metadata like server names, locations, databases, tables, columns etc.
The principle is that metadata is filled where it formed (process and unit).
4.1.3 Harvesting

Metadata captured automatically by computers can include information
about when a metadata value was created, who created it, when it was
last updated. One metadata element is created and maintained in one
system/place, but it can be viewed and used in several systems. All
history is also stored. For efficient metadata retrieval database API can
be used.

Text files (UTF 8 format CSV files) of metadata can be imported
through user application.

Attributes of metadata elements allow easily convert metadata from
legacy and other systems into new ones Automated process can create
metadata by scanning database table structure with descriptions to
metadata repository (by scanning and storing metadata to metadata
repository metadata is converted to suitable format). Metadata in
metadata system is in different format (link between different format), so
is it easy in export or import to convert metadata to other format
(Neuchâtel and ESMS).
4.1.4 Data access authorization metadata creation
Authorization metadata of data access in S-DWH is according to responsibility of
statistical activity in iMETA (domain manager/statistical activity manager) and
according to acceptance of direct manager. Access rights to data according to
privilege of S-DWH (for example viewing and extracting data of statistical activity
from collection system). In iMETA data access authorization metadata can be
created to the following data of S-DWH: raw data from collection system, data in data
staging are (data processing data), data in data warehouse, data cubes.
4.2 Metadata usage
4.2.1 Users
Metadata users in source layer are statistical activity managers/domain managers,
data warehouse developers/IT specialists and methodologists.
25
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Metadata users in integration layer are statistical activity managers/domain
managers, data warehouse developers/IT specialists and data processing operators.
Metadata users in interpretation and analysis layer are statistical activity
managers/domain managers and analysts.
Metadata users in access layer are mainly final/end users (researcher, students,
organizations, etc.), statistical activity managers, methodologists and specialists of
dissemination.
4.2.2 Search and navigation
In integrated metadata system can be searched all metadata (full-text search
function): search from all metadata terminology (Neuchâtel) objects like statistical
activity, classifier, concept, unit of measure, questionnaire, technical characteristic
etc.
In the metadata navigator you can navigate in information of metadata objects like
object properties and object links: SQL object, metadata elements, terminology object
and other objects (role, user).
4.2.3 Metadata export (output generation)
In metadata system there are customized options for report generation. Report or
screen view is used for generating metadata export files (outputs in CSV, XML
format).
Outputs in XML formats: describing the XML structure and CSV file (choosing
columns in the screen), creating matching with metadata and generating output
(ESMS output transport).
4.2.4 Other systems and internationality
Other systems use metadata from metadata system that is integrated with other
systems.

Metadata is used in all data warehouse phases and tools: data
extraction, transformation, loading, presentation etc.

Metadata retrieval and output through an API (combined according to
needs).
Integrated metadata system can be used internationally (share system and context):

26
Multilingual by system and by context. Enable multilingual processing
(inc. Unicode character sets).
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)

4.3
Expandable by using different meta models.
Metadata maintenance
4.3.1 Maintenance of metadata objects (insert, update, delete, versioning)
In integrated metadata system metadata objects are created and changed (updated,
deleted) according to business rules. Versioning rules are flexible by metadata
objects.
Statistical metadata (also quality metadata – statistical activity attributes):

Statistical activity – overview of statistical activities, statistical activities
by domains and sub-domains, search, copy, description, associated
indicators, variables, classifiers, questionnaires.

Classifiers – maintenance of domains of classifiers, classifiers by
domains, version of classifiers: attributes, elements, variants, levels,
matching tables, indexes.

Concepts – maintenance of domains, sub-concepts, associated
concepts, associated statistical characteristics, vocabularies.

Statistical characteristics – maintenance of statistical units by type, subtypes, characteristics.

Units of measure - maintenance of types, associated statistical
characteristics.

Questionnaire – maintenance of questionnaires’ groups, versions of
questionnaires (periods, deadlines, version of statistical activities).

Legal acts – content of legal acts for producing statistics.
Technical metadata

Technical characteristics – maintenance of database metadata,
description of databases and objects (tables) and columns. Maintain by
manually in iMETA application or by automatic process scanning and
importing database structures metadata to metadata repository.
Process metadata

Process log (start time, end time, duration, who started)

Validation rules

Transformation-editing rules
Authorisation metadata

27
User, roles, privileges and operations. Also access to data in
applications of S-DWH.
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Links between metadata objects:

As all metadata are in one metadata repository, you can have an
overview which statistical variables are used in which statistical
activities. Also you can have an overview which statistical unit is
described by which statistical characteristics.
4.3.2 Users and rights (of metadata) management
Methodological unit is responsible for metadata management. Different units are
responsible for metadata inputs and updates.
Metadata systems apply following principles of user rights:

All users of iMETA application can see all metadata in the metadata
repository.

Rights to add, change or delete metadata are driven from roles
assigned to users.
Users' responsibility is limited to the statistical domain, which they manage. All rights
are inherited to sub-domains. User rights are according to the S-DWH system
operations of all S-DWH processes (and all layers): data extraction, data
transformation, data loading and presentation.
S-DWH has following operations: read metadata, create data processing packages,
access to delicate data, solve data processing tasks, schedule packages, see logs
etc. User rights can be granted according to operations of objects (privileges).
Privileges are for example creation of statistical activity, management of version of
statistical activity, changing or deleting the description of statistical activity,
management of variables of version of statistical activity, management of legal acts,
management of questionnaire and management of roles. And users, roles and
privileges are managed in URMA application.
4.3.3 User guides
User guide of iMETA is available as online help, all workers of Statistics Estonia are
also users of iMETA; they have granted access to iMETA application where they can
see online help and all the metadata.
4.4
Metadata evaluation
Evaluation processes according to the metadata quality requirements.
Fill-in controls of metadata validation:
28
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)

no missing required values, in linkage objects have to exist,

metadata element’s code must be unique,

data type format control (inc. accordance to classifier).
Systematic a built-in processes for managing the workflow evaluation of metadata
that control following (validation queries):

accordance to used meta models (for example metadata object has the
following characteristics: name (local language, English), type,
identifier, value domain, links to other elements, mandatory or optional
has to be according to models.
Organizational processes of metadata validation:

methodological manual control in approval for harmonizing statistical
metadata,

test ETL metadata,

data Warehouse architect validates data warehouse data model
metadata.
4.4.1 Use of standards
In metadata element description several standards are used: ISO 11179, SDMX
(ESMS, ESQRS and EPMS).
The MMX Meta model provides a storage mechanism for various knowledge models.
All the meta models used in iMETA are stored in MMX Meta model. MMX Meta
model is based on third level Meta-Object Facility (an Object Management Group
standard for model-driven engineering).
29
4.5
Metadata by metadata functionality groups and layers
Table 2 Metadata by metadata functionality groups and layers
2–Design
2.4 – design
frame and sample
methodology –
creates frame,
sample, stratum
metadata
1.6 – prepare business case – creates metadata
of statistical activity
2.2 – design variable descriptions – creates
variable metadata (quality metadata), classifier
metadata, creates algorithms, creates new
variable metadata (creating algorithms) using
available variables metadata, creates data
quality variables metadata
2.5 - design statistical processing methodology
- creates coding algorithms using classifiers and
coding tables metadata, creates imputation
algorithms uses imputation methods, creates
data processing algorithms (incl. aggregation)
2.6 – design production systems and workflow –
creates data warehouse data model metadata,
creates data staging area data model, creates
scheduling metadata
Pha
se
sub–process/metadata
1.3 – establish output
objectives - creates output
variable metadata
Pha
se
2–Design
1.3 – establish output objectives - creates output
variable metadata
2.1. – design outputs –creates output validation
metadata (algorithms), creates algorithms of
statistical confidentiality, creates dissemination
metadata (publication calendar); builds output
(output algorithms)
2.3 - design
data collection
methodology creates validation
rules of data
collection, design
data model of raw
data, creates pre–
fill metadata
sub–process/metadata
Access layer
3–Build
2.2 – design
variable
descriptions –
creates metadata
of IOR (Initial
observation
registry) variable
Pha
se
Interpretation and data analysis
layer
sub–process/metadata
2.1 – design outputs -builds
output, creates technical
metadata of output
2.6 – design production systems
and workflow - creates statistical
metadata of workflow
3.3 – configure workflows –
creates technical metadata of
workflow
7.1 – update output systems –
creates process metadata (when,
who, etc.)
7–Disseminate
sub–
process/metadata
1–Specify
needs
Pha
se
2–Design
Metadata
creation
Integration layer
1–Specify
needs
Source layer
7.2 – produce dissemination –
creates process metadata
7.3 – manage release of
dissemination products – creates
process metadata
7.4 – promote dissemination –
creates process metadata
7.5 – manage user support–
creates user metadata, creates
process metadata
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
4–Collect
4.2 – set up
collection –
creates process
metadata of set
up collection
31
3–Build
2.1 – design outputs –
creates output validation
metadata (algorithms),
creates algorithms of
statistical confidentiality,
creates dissemination
metadata (publication
calendar); builds output
(output algorithms);
5.3 – review, validate & edit; creates quality
variables (edit failure rate) and creates process
metadata
5.4 – impute; creates quality variables, creates
quality variable metadata (imputation rate) and
process metadata (how, who, when etc)
2–Design
5.2 – classify & code; creates process metadata
5.5 – derive new variables and statistical units –
creates process metadata, creates quality
variable metadata (new variables and new
statistical units rate)
6.1 – prepare draft output; creates output
variable metadata (algorithms) using output
variables metadata, classifiers metadata,
creates tables titles (using output variable
metadata)
2.4 – design frame and
sample methodology –
creates sample
metadata
2.6 – design production
systems and workflow –
creates statistical
metadata of workflow
5.7 – calculate aggregate – creates process
metadata
5.8 – finalize data files – creates data finalizing
metadata (date, person etc.), creates data
loading to SDW metadata (logs, date, person
etc.) .
2.2 – design variable
description – creates
new variable metadata.
2.5 – design statistical
processing methodology
– describe
methodological rules for
analyse
5.6 – calculate weights, - creates process
metadata
4.3 – run
collection –
creates process
metadata (when,
who, where) of
data collection
4.4 – finalize
collection–
creates finalize
metadata
3.3 – configure workflows – creates scheduling
technical metadata.
3–Build
3.3 – configure
workflows –
creates technical
metadata of IOR,
creates technical
metadata of data
collection
instruments
Interpretation and data analysis
layer
5.1 – integrate data; creates process metadata
(where data is from data source metadata) ,
creates quality indicators for integration process,
5–Process
3.1 – build data
collection
instrument –
creates data
collection
structure
technical
metadata
Integration layer
6–Analyze
3–Build
Source layer
3.3 – configure
workflows - creates
output technical
metadata
Access layer
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Source layer
Integration layer
Interpretation and data analysis
layer
4–Collect
4.1 – select sample –
creates process
metadata of sample
taking using 2.4
metadata
5–Process
5.1 – integrate data;
creates process
metadata (where data is
from data source
metadata), creates
quality indicators for
integration process,
5.5 – derive new
variables and statistical
units – creates process
metadata, creates
quality variable
metadata (new variables
and new statistical units
rate)
5.6 – calculate weights –
creates process
metadata
5.7 – calculate
aggregate – creates
process metadata
32
Access layer
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Source layer
Integration layer
Interpretation and data analysis
layer
6– Analyze
6.1 – prepare draft
output – creates
process, technical
metadata of output
6.2 – validate outputs –
creates process
metadata
6.3 – scrutinize and
explain – creates
process metadata
6.4 – apply disclosure
control – creates
process metadata
9-Evaluate
7-Disseminate
6.5 – finalize outputs –
creates process
metadata
33
7.1 – update output
systems – creates
update process
metadata by using
output metadata (output
variable metadata,
classifiers metadata).
9.1 – gather evaluation
inputs – creates process
metadata
9.2 – conduct evaluation
– creates process
metadata
Access layer
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
4.3 – run
collection – uses
variable
metadata, IOR
metadata
4.4 – finalize
collection – uses
IOR metadata
34
2.2 – design variable descriptions –
creates new variable metadata (creating
algorithms) using available variables
metadata
2.5 - design statistical processing
methodology – uses variable metadata
2.6 – design production systems and
workflow –uses statistical processing
metadata
3–Build
2.1 – design outputs – uses output variable
3.3 – configure workflows – uses
scheduling statistical metadata
sub–process/metadata
1.5 – check data
availability – uses
variable descriptions,
uses variable and data
element link descriptions
3.3 – configure
workflows - uses
statistical metadata of
workflow
4.1 – select sample –
uses 2.4 methodological
metadata
5.1 – integrate data;,
uses data model of raw
data, uses variable
metadata
5.5 – derive new
variables and statistical
units – uses new
variable/unit creating
algorithms
Access layer
Pha
se
sub–process/metadata
7.1 – update output systems,uses output metadata
7–Disseminate
1–Specify
needs
1.5 – check data availability – uses metadata
of available administrative data
Ph
as
e
4–Collect
4.2 – set up
collection – uses
2.6 pre–fill
metadata,
metadata of
collection
instrument, , uses
4.1 select sample
metadata
Metadata is multilingual (English, local language), possible
to share internationally via unified services with standard
format (like XML, SDMX)
2–Design
3.3 – configure
workflows – uses
metadata of IOR
variables
All S–DWH uses the same metadata repository
Interpretation and data analysis
layer
5–Process
4–Collect
sub–
process/metadata
3–Build
Pha
se
3–Build
Metadata
usages
Integration layer
1–Specify
needs
Source layer
7.2 – produce dissemination –
uses output metadata
7.3 – manage release of
dissemination products – uses
output metadata
7.4 – promote dissemination –
uses output metadata
7.5 – manage user support –
uses output metadata, user
metadata
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Integration layer
Interpretation and data analysis
layer
5.1 – integrate data; uses, sample
metadata, uses data model of raw data,
uses variable metadata
6.1 – prepare draft
output - uses output
metadata
For data profiling uses data quality
variables metadata, uses algorithms for
integration (by using priority metadata)
6.3 – scrutinize and
explain – uses output
metadata.
5.2 – classify & code; uses coding
algorithms, classifiers, coding tables’
metadata.
6.2 – validate outputs –
uses validation
algorithms
5.4 – impute; uses imputation algorithms
6–Analyze
5–Process
Source layer
5.5 – derive new variables and statistical
units– uses new variable creating
algorithms
Metadata
maintenance
35
Maintain (create, update,
delete, versioning) source
metadata (data sources
metadata, variables
Maintain (create, update, delete, versioning) integration
metadata (data processing algorithms, data warehouse
data models etc.).
User rights are according to the S–DWH system
7–
Disseminate
6.1 – prepare draft output; uses output
variables metadata, classifiers metadata,
create tables titles (uses output variable
metadata)
7.1 – update output
systems – uses output
metadata.
9-Evaluate
6–Analyze
5.8 – finalize data files –uses data
warehouse data model metadata
6.4 – apply disclosure
control; uses algorithms
of statistical
confidentiality,
6.5 – finalize outputs –
uses output metadata,
quality variables
metadata.
5.6 – calculate weights; uses stratum and
frame metadata
5.7 – calculate aggregate; uses methods
for aggregate, uses variable metadata,
classifiers metadata
Access layer
9.1 – gather evaluation
inputs – uses evaluation
metadata
9.2 – conduct evaluation
– uses evaluation
metadata
Maintain (create, update, delete,
versioning) interpretation metadata
(outputs, output variables,
classifier, clarifying metadata).
Maintain (create, update, delete, versioning)
access metadata (output, product, publication
calendar, user support). Manual, automated.
User rights are according to the S–DWH
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
Source layer
metadata, sample metadata
etc.). Mostly manually
managed, technical
automated.
Integration layer
Interpretation and data analysis
layer
operations of all S–DWH processes (and all layers) .
Manual, automated.
Integration metadata can be stored in different meta
models (managing meta models). Meta model for
statistical metadata, process metadata etc.
User rights are according to the
S-DWH system operations of all
S-DWH processes.
Evaluate and assure quality metadata of integration layer
by – fill-in controls of metadata validation
Evaluate and assure quality
metadata of interpretation and data
analysis layer by – fill-in controls of
metadata validation
Access layer
system operations of all S–DWH processes.
User rights are according to
the S–DWH system
operations of all S-DWH
processes – data extraction,
data transformation, data
loading and presentation.
S–DWH has following
operations – read metadata,
create data processing
packages, access to delicate
data, solve data processing
tasks, schedule packages.
Metadata
evaluation
Evaluate and assure quality
metadata of source layer by
– fill-in controls of metadata
validation
Systematic built–in
processes for managing the
workflow of quality
assurance of metadata
Organizational processes of
metadata validation
36
Systematic built-in processes for managing the workflow of
quality assurance of metadata
Organizational processes of metadata validation
Systematic built–in processes for
managing the workflow of quality
assurance of metadata
Organizational processes of
metadata validation
Evaluate and assure quality metadata of data
access layer by – fill-in controls of metadata
validation
Systematic built-in processes for managing
the workflow of quality assurance of metadata
Organizational processes of metadata
validation
ESSnet on Data Warehousing
Maia Ennok (Statistics Estonia)
5. Conclusion
Witnessing how metadata is handled in the integrated metadata system of S-DWH, we can see lots of benefits of the integrated
metadata system. Benefits for users and systems are for examples: unified metadata, integrated metadata, metadata accessibility,
metadata availability for opportunities of analysis. With metadata unification we would have unambiguous metadata. Metadata
integration allows us to access to all relations between different kind of metadata and metadata elements so we could make
different kind of analysis based on these relations.
37
Download